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Abstract —Sparse superposition codes were recently introduced 
by Barron and Joseph for reliable communication over the 
AWGN channel at rates approaching the channel capacity. The 
codebook is defined in terms of a Gaussian design matrix, 
and codewords are sparse linear combinations of columns of 
the matrix. In this paper, we propose an approximate message 
passing decoder for sparse superposition codes, whose decoding 
complexity scales linearly with the size of the design matrix. 
The performance of the decoder is rigorously analyzed and it 
is shown to asymptotically achieve the AWGN capacity with an 
appropriate power allocation. Simulation results are provided 
to demonstrate the performance of the decoder at finite block- 
lengths. We introduce a power allocation scheme to improve 
the empirical performance, and demonstrate how the decoding 
complexity can be significantly reduced by using Hadamard 
design matrices. 

Index Terms —Sparse regression codes, capacity-achieving 
codes, AWGN channel, coded modulation, low-complexity decod¬ 
ing, compressed sensing 

I. Introduction 

T His paper considers the problem of constructing low- 
complexity, capacity-achieving codes for the memoryless 
additive white Gaussian noise (AWGN) channel. The channel 
generates output y from input x according to 

y = x + w , (1) 


code. Despite the strong theoretical performance guarantees, 
the rates achieved by this decoder for practical block lengths 
are significantly less than C. Subsequently, a soft-decision 
iterative decoder was proposed by Cho and Barron |[3j, 0), 
with theoretical guarantees similar to the earlier decoder in [2l 
but improved empirical performance for finite block lengths. 

In this paper, we propose an approximate message passing 
(AMP) decoder for SPARCs. We analyze its performance and 
prove that the probability of decoding error goes to zero with 
growing block length for all fixed rates R < C. The decoding 
complexity is proportional to the size of the design matrix 
defining the code, which is a low order polynomial in n. 

A. Approximate Message Passing (AMP) 

“Approximate message passing” refers to a class of algo¬ 
rithms 0-03 that are Gaussian or quadratic approximations 
of loopy belief propagation algorithms (e.g., min-sum, sum- 
product) on dense factor graphs. AMP has proved particularly 
effective for the problem of reconstructing sparse signals from 
a small number of noisy linear measurements. This problem, 
commonly referred to as compressed sensing (T3j, is described 
by the measurement model 

y = Af3 + w. (3) 


where the noise w is a Gaussian random variable with zero 
mean and variance a 2 . There is an average power constraint 
P on the input x : if x ±,..., x n are transmitted over n uses of 
the channel, then we require that I ]G" =1 x 2 < P. The signal- 
to-noise ratio A is denoted by snr. The goal is to construct 
codes with computationally efficient encoding and decoding, 
whose rates approach the channel capacity given by 

C := ilog(l + snr). (2) 

Sparse superposition codes, also called Sparse Regression 
Codes (SPARCs), were recently introduced by Barron and 
Joseph Q, @ for communication over the channel in 0- 
They proposed an efficient decoding algorithm called ‘adaptive 
successive decoding’, and showed that for any fixed rate 
R < C, the probability of decoding error decays to zero 
exponentially in ]o n n , where n is the block length of the 
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Here A is an n x N measurement matrix with n < N, 
(3 £ lA is a sparse vector to be estimated from the observed 
vector y £ R", and w £ R" is the measurement noise. One 
popular class of algorithms to reconstruct /3 is l-\ -norm based 
convex optimization, e.g. G3 —l|T6|. Though these algorithms 
have strong theoretical guarantees and excellent empirical 
performance, the computational cost makes it challenging to 
implement the convex optimization procedures for problems 
where N is large. A fast AMP reconstruction algorithm for the 
model in 0) was proposed in {5j. Its empirical performance 
(for a large class of measurement matrices) was found to be 
similar to convex optimization based methods at significantly 
lower computational cost. 

The factor graph corresponding to the model in 0 is dense, 
hence it is infeasible to implement message passing algorithms 
in which the messages are complicated real-valued functions. 
AMP circumvents this difficulty by passing only scalar pa¬ 
rameters corresponding to these functions. For example, the 
scalars could be the mean and the variance if the functions 
are posterior distributions. The references |6), |8j, |1()| , JTlj 
describe how various flavors of AMP for the model in 0 can 
be obtained by approximating the standard message passing 
equations. These approximations reduce the message passing 
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equations to a set of simple rules for computing successive 
estimates of (3. 

In |j5j, it was demonstrated via numerical experiments that 
the mean-squared reconstruction error of these estimates of (3 
could be tracked by a simple scalar iteration called state evolu¬ 
tion. In [7j, it was rigorously proved that the state evolution is 
accurate in the large system limif] for measurement matrices 
A with i.i.d. Gaussian entries. 

In addition to compressed sensing, AMP has also been 
applied to a variety of related problems, e.g. [[17~|-JT9] . We 
will not attempt a complete survey of the growing literature 
on AMP; the reader is referred to m-m for comprehensive 
lists of related work. 

B. Contributions of the Paper 

• We propose an AMP decoder for sparse regression codes, 
which is derived via a first-order approximation of a min- 
sum-like message passing algorithm. 

• The main result of the paper is Theorem [T] in which we 
rigorously show that the probability of decoding error 
goes to zero as the block length tends to infinity, for all 
rates R < C. 

• The performance of the decoder for finite block lengths 
is demonstrated via simulation results. We introduce a 
power allocation scheme that significantly improves the 
empirical performance for rates not close to C. We also 
show how the decoding complexity can be reduced by 
using Hadamard-based design matrices. 

To prove our main result, we use the framework of Bayati 
and Montanari who in turn built on techniques 

introduced by Bolthausen J20) . However, we remark that the 
analysis of the proposed algorithm does not follow directly 
from the results in 0, (U). The main reason for this is that 
the undersampling ratio n/N in our setting goes to zero in the 
large system limit, whereas previous rigorous analyses of AMP 
consider the case where the undersampling ratio is a constant. 
This point, as well as other differences from the analysis in 
0, 0, is discussed further in Section |V-D| 

C. Related work on communication with SPARCs 

The adaptive successive decoder of Joseph-Barron JXj and 
the iterative soft-decision decoder of Cho-Barron 0 , (4J both 
have probability of error that decays as n/ logn for any fixed 
rate R < C, but the latter has better empirical performance. 
Theorem [T| shows that the probability of error for the AMP 
decoder goes to zero for all R < C, but does not give a rate of 
decay; hence we cannot theoretically compare its performance 
with the Cho-Barron decoder in pi- We can, however, compare 
the two decoders qualitatively. 

Both the AMP and the Cho-Barron decoder generate a 
succession of estimates /3 1 , /3 2 ,... for the message vector 
(3 based on test statistics s 0 ,® 1 ,..., respectively. At step t, 
the Barron-Cho decoder generates statistic s* based on an 
orthonormalization of the observed vector y and the previ¬ 
ous ‘fits’ A/I 1 ,..., A/3*. In contrast, the test statistic in the 

1 The large system limit considered in j7j lets n, N oo with n/N held 
constant. 


AMP decoder is based on a modified version of the residual 
(y — Aft 1 ). Despite being generated in very different ways, 
the test statistics of the AMP and Cho-Barron decoders have 
a similar structure: they are asymptotically equivalent to an 
observation of (3 corrupted by additive Gaussian noise whose 
variance decreases with t. However, the AMP statistic is faster 
to compute in each step, which makes it feasible to implement 
the decoder for larger block lengths. 

An approximate message passing decoder for sparse super¬ 
position codes was recently proposed by Barbier and Krzakala 
in |22) . This decoder has different update rules from the AMP 
proposed here. A replica-based analysis of the decoder in | [22| 
suggested it could not achieve rates beyond a threshold which 
was strictly smaller than C. Subsequently, Barbier et al | |23) 
reported empirical results which show that the performance of 
the decoder in f22) can be improved by using spatially coupled 
Hadamard matrices to define the code. 

Finally, we mention that bit-interleaved coded modulation 
is a technique widely used for communication over 
AWGN channels. Some alternative approaches to designing 
high-rate codes for the AWGN channel are low-density lattice 
codes [25j and the recently proposed polar lattices [[26). 


D. Paper outline and Notation 

The paper is organized as follows. The SPARC construction 
is described in Section [II] We describe the AMP channel 
decoder in Section III and provide some intuition about its 
iterations. We also show how the decoder can be derived 
as a first-order approximation to a min-sum-like message 
passing algorithm. Section IV contains the main result, which 
characterizes the performance of the AMP decoder for any 
rate R < C in the large system limit. In Section IV-A we 


present simulation results to demonstrate the performance of 
the decoder at finite block lengths. Section [V] contains the 
proof of the main result, and the proof of a key technical 
lemma is given in Section [VI] 

Notation : The ^-norm of vector x is denoted by ||cc||. The 
transpose of a matrix B is denoted by B*. The Gaussian 
distribution with mean p. and variance er 2 is denoted by 
Af(p,cr 2 ). For any positive integer m, [m] denotes the set 
{1,..., m}. The indicator function of an event A is denoted 
by 1(_4). f(x) = o(g(x)) means lim^oo f(x)/g(x) = 0; 
f(x) = <d(g(x)) means f(x)/g(x) asymptotically lies in an 
interval [/sq, k 2 ] for some constants rt\ , k - 2 > 0. log and In are 
used to denote logarithms with base 2 and base e, respectively. 
Rate is measured in bits. 


II. The Sparse Regression Codebook 

A sparse regression code is defined in terms of a dictionary 
or design matrix A of dimension n x ML, whose entries 
are i.i.d. J\f(0, ^). Here n is the block length, and M, L are 
integers whose values are specified below in terms of n and the 
rate R. As shown in Fig. [T] one can think of the matrix A being 
composed of L sections with M columns each. Each codeword 
is a linear combination of L columns, with one column from 
each section. Formally, a codeword can be expressed as A/3, 
where (3 is an MLxl vector (/ 3 i ,..., (3ml) with the following 







3 


A: 


Section 1 Section 2 

M columns . M columns 


(3: 0 , 0 ,VnPi,\ 0,VnA,0, 


Section L 
M columns 


ysA.o, 



Fig. 1. A is an n x ML matrix and /? is a ML x 1 vector. The 
positions of the non-zeros in /3 correspond to the gray columns of A 
which combine to form the codeword A/3. 


message as a random vector /?, which is uniformly distributed 
over Bm,l{Pi, ■ ■ ■, Pl), the set of length ML vectors that 
have a single non-zero entry \JnPp in section £, for £ £ \L\. 
We will denote the true message vector by Bo', Bo should be 
understood as a realization of the random vector /3. 

We will use indices i,j to denote specific entries of /3, while 
the index £ will be used to denote the entire section £ of 3. 
Thus /3j, 3 : j are scalars, while Be is a length M vector. We 
also set N = ML. 

The performance of the SPARC decoder will be charac¬ 
terized in the limit as the dictionary size goes to oo. We 
write lima; to denote the limit of the quantity x as the 
SPARC parameters n,L,M —> oo simultaneously, according 
to M = L a and a L log L = nR. 


property: there is exactly one non-zero Bj for 1 < j < M, 
one non-zero Bj for M + 1 < j < 2M, and so forth. The 
non-zero value of B in section £ £ [L] is set to s/nPp, where 
the positive constants Pp satisfy Y^cLi Pi = P- Denote the set 
of all /3’s that satisfy this property by Bm,l{Pi, ■ ■ ■, Pl)- 

Since each of the L sections contains M columns, the total 
number of codewords is M L . To obtain a communication rate 
of R bits/sample, we need 

M L = 2 nR or L log M = nR. (4) 

There are several choices for the pair ( M, L) which satisfy 
0. For example, L = 1 and M = 2 nR recovers the Shannon- 
style random codebook in which the number of columns in A 
is 2 nR . For our constructions, we will choose M equal to L a , 
for some constant a > 0. In this case, 0 becomes 

aL log L = nR. (5) 

Thus L = and the size of the design matrix A (given 

by n x AIL = n x L a+1 ) now grows as n 2+a /(logn)° +1 . 

Encoding: The encoder splits its stream of input bits into 
segments of log M bits each. A length ML message vector 
Bo is indexed by L such segments—the decimal equivalent of 
segment £ determines the position of the non-zero coefficient 
in section £ of Bo- The input codeword is then computed as 
x = ABo', note that computing x simply involves adding L 
columns of A, weighted by the appropriate coefficients. 

Power Allocation : The power allocation {Pp}p =1 , plays an 
important role in determining the performance of the decoder. 
We will consider allocations where Pp = 0 (j-). Two examples 
are: 

• Flat power allocation across sections: Pp = £ £ [L\. 

• Exponentially decaying power allocation: Fix parameter 
k > 0. Then Pp <x 2~^ L , £ £ [L\. 

We use the exponentially decaying allocation with k = 2C for 
Theorem [I] In Section |IV-A[ we discuss other power alloca¬ 
tions, and find that an appropriate combination of exponential 
and flat allocations yields good decoding performance at finite 
block lengths. 

Both the design matrix A and the power allocation {Pp} are 
known to the encoder and the decoder before communication 
begins. 

Some more notation: In the analysis, we will treat the 


III. The AMP Channel Decoder 


Given the received vector y = ABo + w, the AMP decoder 
generates successive estimates of the message vector, denoted 
by {/3 4 }, where /3 4 £ 'M. N for t = 1,2,.... Set B° = 0, the 
all-zeros vector. For t = 0,1,..., compute 


z* = y - AB t + 

T t -1 

B*+ 1 =r ] t i (B t + A * z t ), 



n J 


for i = l,...,N = ML, 


( 6 ) 

(7) 


where quantities with negative indices are set equal to zero. 
The constants {r t }, and the estimation functions t]\{ ) are 
defined as follows for t = 0,1,.... 

Define 


To=<7 2 + P, t 2 +1 =<r 2 + P(l-a; t+ i), t > 0, (8) 


where 


A Pp 


x t+ 1 = 2^ p E 
t=i 


{U{+- 


e v lT ■rt ’+Y, J= 2 e Tt 


(9) 


In {C/J} are i.i.d. 7V(0,1) random variables for j £ 
[M], £ £ [L\. 

The notation j £ sec(£) will be used as shorthand for “index 
j in section £”, i.e., j £ {(£— 1)M +1,..., £M} where £ £ [L\. 
For i £ [A] such that i £ sec(£), define 


Vi («) = V nPp 


E 


j£sec(£) 


e SjVnP e /T? 


( 10 ) 


Notice that r?|(s) depends on all the components of s in the 
section containing i. For brevity, the argument of rj\ in |7]) 
is written as A*z t + /?*, with the understanding that only 
the components in the section containing i play a role in 
computing rjj. 

Before running the AMP decoder, the constants {r t } must 
be iteratively computed using (|8]i and 0. This is an offline 
computation: for given values of M, L, n, the expectations 
in 0 can be computed via Monte Carlo simulation. The 
relation 0, which describes how r t+ i is obtained from r ( , 
is called state evolution, following the terminology in |5J, 
|7]. In Section [rV| (Lemmas [T| and [2|, we derive closed form 
expressions for Xt and r 2 as n —> oo for each t > 0, which 
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we denote by x* and r 2 . In Section 


IV 


it is shown that for 

an appropriately chosen power allocation, x0 strictly increases 
with t until it reaches 1 in a finite number of steps T* for any 
fixed R < C. (For the exponentially decaying allocation used 


in Theorem 


_ 


2 C 


log (CAR) > as S iven in ©■) 

The AMP" decoder is run for T* steps, and iteratively 
computes codeword estimates /3 1 ,, 0 T using 0 and 0 - 
Finally, in each section i of 0 T , set the maximum value to 
\/nP( and remaining entries to 0 to obtain the decoded mes¬ 
sage 0. Our main theoretical result (Theorem [l]i characterizes 
the performance of the AMP decoder run for T* steps with 
the asymptotic values {f 2 } t= o,...,T* 



A. The Test Statistics 0 f + A*z* 

To understand the decoder let us first focus on (|7j, in which 
/ 3 t+1 is generated from the test statistic 

s t :=0 t + A*z t . (11) 

The AMP update step 0 is underpinned by the following key 
property of the test statistic: s* is asymptotically (as n —> oo) 
distributed as j.3 + f t Z, where f t is the limit of r t , and Z is 
an i.i.d. Af( 0,1) random vector independent of the message 
vector 0. This property, which is proved in Section [V] is due 
to the presence of the “Onsager” term 

\w\ 

n J 

in the residual update step ([6|. The reader is referred to |7] 
Section I-C] for intuition about role of the Onsager term in 
the standard AMP algorithm. 

In light of the above property, a natural way to generate 
0 t+1 from s* = s is 

0 t+1 (s)=E[0\0 + T t Z = s\, (12) 

i.e., 0 t+1 is the Bayes optimal estimate of 0 given the 
observation s* = 0 + r t Z. For i £ sec(7), £ £ [L], we have 

0t +1 (s)= E[0i | 0 + r t Z = s] 

-E[/3i | { 0j + A Zj .Sj }j escc (^)] 

= \fnPi P(0i = \fnPt \ {0 3 + T t Z 3 = Sj} iesecW ) 

_ sfnPi /({gjljgsecffl | 0i = sfnPt) P{0i = VnPe) 
Sfcesec(^) f({ s j Ij'esec(^) I Pk = \/nPt .) P{0k = s/tlPf) 

(13) 

where we have used Bayes’ theorem with f(-\0k = \fnPt) 
denoting the joint density of {0j + T t Zj}j esec m conditioned 
on 0k being the non-zero entry in section l. Since 0 and Z 
are independent with Z having i.i.d. Af( 0,1) entries, for each 
k £ sec(£) we have 

f (I A Zj = Sj j j es cc(f) | 0k — \fttP0) 

(X g-(sfc-VSPF) 2 /2r t 2 e -s?/ 2 A 2 

jkscc(t) sj^k (IT) 

_ e sus/uPi/tI e ~nPi/lTt g-Sj/ 2 n 2 ^ 

j’G sec(£) 



Fig. 2. Progression of 8j. /\/nPt with t for various sections £, 
where ie is the correct term in section l. The SPARC parameters 
are L = 512, M = 1024, snr = 15,7? = 0.7C, P e oc 2~ 2m/L . 
The figure shows the progression for a ‘typical’ simulation run of 
the AMP decoder, where there were no section errors after decoding. 
In 100 runs with the above SPARC parameters, a majority of runs 
resulted in no section errors, and over 95% of the runs had fewer 
than five section errors. 

Using ( fl4| in ( fl3j ), together with the fact that PWk = 
y/nPt) = jj for each k £ sec(£), we obtain 

__ p Si VnP c /Tt 

r'M = m i P+m =»] = ^ 

sec(C) e 3 ' * 

(15) 

which is the expression in ( |T()| . 

Thus, under the distributional assumption that s l equals 
0 + r t Z, 0 t+1 is the estimate of the message vector 0 (based 
on s‘) that minimizes the expected squared estimation error. 
Further, for i £ sec(£), 0\ +1 /\/nPt is the posterior probability 
of 0i being the non-zero entry in section (:, conditioned on the 
observation s f = 0 + r t Z. Fig. |2] shows the progression of 
0\ / s/ n Pi with t for various sections l. where ie denotes the 
index of the true non-zero entry in section l. We see that the 
later sections (which are allocated less power) require a larger 
number of iterations for the posterior probability of the correct 
term in the section to transition to a value close to one. The 
iteration at which this transition occurs is determined by the 
state evolution equations ([8]) and |9|, as discussed below. 


B. State Evolution and its Consequences 

We now discuss the role of the quantity x t +i in the state 
evolution equations ([8j and ([bji. 


Proposition 1. Under the assumption that s 4 = 0+r t Z, where 
Z is i.i.d. ~ Af(0,l) and independent of 0, the quantity x t+1 
defined in 0 satisfies 


x t+1 = ^pE[0*0 t+1 }, 1 - x t+1 = -Le[\\0 - 0 

and consequently, t 2 +1 = a 2 + ^ -U_ 


t+ 1 ll 2 L 

(16) 


Proof: For convenience of notation, we relabel the N i.i.d. 
random variables {Zk}ke[N] as 
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{Uj}j e [M],ee[L]- F° r an Y 6 U e denotes the length M vector 
{U |}j£[m]> and t/ is the length iV vector We have 


^E[/3*/3 t+1 ] = \E[p* rf{p + T t U)\ 
nr nr 

= ^i^nyfrPtr,Uw(Pt + nU e )] 


(b) JL 

nP 


E E 


n-re 


VnPi ■ 


/nPg ( \ZnP g +r t C/f) /r t 2 


Aft, o 

= E-? e 


£=1 


e 


rrP^^/TTP^+rtC/f) 

-^- 




v - ^ M 

' E=2 e 


V^Pl-rtU^ 


(^+^) +E m e ^t 


e T t 


= ar t+ i. 

(17) 


In (a) above, the index of the non-zero term in section £ is 
denoted by sent(£). (6) is obtained by assuming that sent(£) 
is the first entry in section l — this assumption is valid because 
the prior on j3 is uniform over ,..., Pl)- 

Next, consider 


1 

nP 


E[||/3-/3* +1 || 2 ] 


= 1 + 


E[||/3 


*+11121 


- 2E \(3*/3 


t+ii 


nP 


(18) 


Under the assumption that s* = /? + r t Z, recall from Section 


therefore have 


III-A 


that (3 t+1 can be expressed as /3 t+1 = E[/3 | s*]. We 


E[||/3 t+1 || 2 ] = E[ ||E[/3| S *]|| 2 ] = E[ ms*] -p + ^)*E[/3| S t ]J 
= E[/3*E[/3|s*] ] = E[/3*/3 t+1 ], 

(19) 


where step (a) follows because E[ (E[/3|s*] — /3)*E[/3|s 4 ] ] = 0 
due to the orthogonality principle. Substituting ( fl9| ) in © 
and using © yields 


1 

nP 


ot-t-t ||2l 


= 1 - 


E [f3*/3 t+1 ] 
nP 


= 1 - X t +l- 


The last claim then follows from ((S). 


Hence x t +i can be interpreted as the expectation of the 
(power-weighted) fraction of correctly decoded sections in step 
t + 1. We emphasize that this interpretation is accurate only 
in the limit as n, M, L -+ oo, when s 1 is distributed as fj + 
? t Z, with f t := limr t . In Section [V] (Lemmas |T| and [2|), we 
derive a closed-form expression for Xt+i '■= bin .x' (+1 under 
an exponentially decaying power allocation of the form I’t oc 
2-2 ct/L_ s | low that for ra tes R < C, 


x t = 


(1 + snr) - (1 -i-snr) 1 -^- 1 

snr 


t 2 = a 2 + P(1 — Xt), 

( 20 ) 


for t > 0 where £_ ] = 0 and 


6 = min 


log 


+ 6-1 i 1 f • 


( 21 ) 


A direct consequence of ( |20| and © is that Xt strictly 
increases with t until it reaches one, and the number of steps 

T until xt* = 1 is T = \ og (c/R) 



t 


Fig. 3. Comparison of state evolution predictions with AMP per¬ 
formance. The SPARC parameters are M = 512, L = 1024, snr = 
15,1? = 0.7C, Pt oc 2~ 2Ct P . The average of the 200 trials (green 
curves) is the dashed red curve, which is almost indistinguishable 
from the state evolution prediction (black curve). 


The constants {6}t>o h ave a nice interpretation in the 
large system limit: at the end of step t + 1, the first 
fraction of sections in f3 t+1 will be correctly decodable with 
high probability, i.e., the true non-zero entry in these sections 
will have almost all the posterior probability mass. The other 
(1 — £ t ) fraction of sections will not be correctly decodable 
from 8 t+ l as the power allocated to these sections is not large 
enough. An additional log (^) fraction of sections become 
correctly decodable in each step until T*, when all the sections 
are correctly decodable with high probability. Fig. [2] illustrates 
when various sections of /J become decodable for a finite-sized 
SPARC with L = 512, M = 1024, and R = 0.7 C. 

As x t increases to 1, ( [20] ) implies that t 2 , the variance of 
the “noise” in the AMP test statistic, decreases monotonically 

= a 2 . In other words, the 


p2 

/ jn* 


from Tq = a 2 + P down to Tj 
initial observation y = Af3 + w is effectively transformed by 
the AMP decoder into a cleaner statistic s T = fi + w', where 
w' is Gaussian with the same variance as the measurement 
noise w. 

To summarize, for any fixed R < C, when the AMP decoder 

then in 


2 C 

log (C/it) 


is run for a finite number of steps T* = 

the large system limit lim -E||/3 — f3 T || 2 equals zero. 

For finite-sized dictionaries, the test statistic s t will not be 
precisely distributed as /3+r t Z. Nevertheless, computing x t+ i 
numerically via the state evolution equations ([8]) and ([9]i yields 
an estimate for the expected weighted fraction of correctly 
decoded sections after each step. Figure [3] shows the trajectory 
of (1 — Xt) vs t for a SPARC with the parameters specified in 
the figure. The empirical average of 1 — (/3gf3 t )/ n P matches 
almost exactly with 1 — x t . The theoretical limit 1 — x t given 
in (|20|) is also shown in the figure. 


C. Derivation of the AMP 

We describe a min-sum-like message passing algorithm for 
SPARC decoding from which the AMP decoder is obtained 
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as a first-order approximation. The aim is to highlight the 
similarities and differences from the derivation of the AMP 
in []7j. The derivation here is not required for the analysis in 
the remainder of the paper. 

Consider the factor graph for the model y = A/3 + w, 
where j3 £ Bm,l{P\, ■ ■ ■ ,Pl)- Each row of A corresponds to 
a constraint (factor) node, while each column corresponds to a 
variable node. We use the indices a, b to denote factor nodes, 
and indices i,j to denote variable nodes. The AMP updates 
in (|6|-([7| are obtained via a first-order approximation to the 
following message passing algorithm that iteratively computes 
estimates of j3 from y. 

For i £ [TV], a £ [n], set /3°_j, a = 0, and compute the 
following for t > 0: 

^a-H = Va ~ £1 A a j/3j^ a , (22) 

j£[N]\i 

= Vi (Si^a) , (23) 


where t ]|(-) is the estimation function defined in i [I()| ), and for 
i £ sec(f), the entries of the test statistic Si_> a £ R M are 
defined as 


{ s i—y a )i — 'y ] AbiZfr^i, 

b£[n]\a 

= 'y ' AbjZ^j, j £ sec (e)V- 

6e[n] 


(24) 


It is useful to compare the /3-update in ( |23| to the message 
passing algorithm from which the traditional AMP is derived 
(cf. equation (1.2) in |7|). In 0, the vector x to be recovered 
is assumed to be i.i.d. across entries; hence we have a single 
estimating function rf in this case, which for i £ [AT], a £ [n], 
generates the message 


X i¥a=V* ( £ A bi Z b—ti ) * ( 2 5 ) 

V 6G [n]\a / 

In ( |25] l, each outgoing message from the vth variable node 
depends only on its own incoming messages. In contrast, in 
m each outgoing message from a variable node depends 
on the incoming messages of all the other nodes in the same 
section. This is due to the constraint that /3 has exactly one 
non-zero entry in each section, which ensures that entries of 
j3 t within each section are dependent, while entries in different 
sections are mutually independent. 

The derivation of the AMP updates in 0-0 starting 
from the messaging passing algorithm ([22])—([23]) is given in 
Appendix [A] 


IV. Performance of the AMP Decoder 


Before giving the main result, we state two lemmas that 
specify the limiting behaviour of the state evolution parameters 
debited in (|8j, ([9]). Treating Xt+i in 0 as a function of r, we 
can dehne 


Pf 


<r) :=£^ E 








2^3=2 




(26) 


where {Uj} are i.i.d. ~ 3V(0,1) for j £ [M], i £ [L\. 

Lemma 1. For any power allocation {Pe}e—i . l that is non¬ 

increasing with l, we have 

L«*(r)LJ 

x{t) := lima;(T) = lim £ (27) 

l=i 

where £*(t) is the supremum of all 1/ £ (0,1] that satisfy 
limLP^j > 2(ln2)i?. r 2 . 

If lim < 2(ln2 )Rt 2 for all £ > 0, then x(r) = 0. 
(The rate R is measured in bits.) 


Proof: In Appendix [B] ■ 

Since the entries of A are i.i.d., the assumption that {Pi} is 
non-decreasing with l can be made without loss of generality. 
Recalling that Xt+i is the expected power-weighted fraction 
of correctly decoded sections after step (t + 1), for any power 
allocation { Pi }, Lemma [T] may be interpreted as follows: in 
the large system limit, sections I such that (■ < LC( T t)£J wil1 
be correctly decoded in step (t + 1). All sections satisfying 
this condition will be decodable in step (£+1) (i.e., will have 
most of the posterior probability mass on the correct term); 
conversely all sections whose power falls below the threshold 
will not be decodable in this step. 

The performance of the AMP decoder will be analyzed with 
the following exponentially decaying power allocation: 

o2 C/L _ i 

Pt = P- 2 _ 2C • 2~ 2C ^ L , t£[L\. (28) 

For the power allocation in ( |28) >, we have for £ £ (0,1] 

lim LP^ l j = a 2 ( 1 + snr) 1-5 ln(l + snr). (29) 


Lemma 2. For the power allocation {Pf} given in ( |28 [ >, we 
have for t = 0,1,...: 


(1 + snr) - (l + snr) 1 -^- 1 

x t ■■= lim xt. = -, 

snr 

t 2 := lim t 2 = a 2 + P( 1 - x t ) = a 2 (1 + snr) 1_?t 


(30) 

1 (31) 


where £_i =0, and for t > 0, 


6 


= min 




+ £t-i 



(32) 


Proof: In Appendix |C| ■ 

We observe from Lemma [2] that increases in each step 
by ,J C log ( C R ) until it equals 1. Also note that f t 2 strictly 
decreases with t until it reaches a 2 (when reaches 1), after 
which it remains constant. Thus the number of steps until 
reaches one (i.e., f t 2 stops decreasing) equals 


rjn * 


2 C 

l°g (C/R) 


(33) 


Our main result is proved for the following AMP decoder, 
which uses the asymptotic values {f 2 } debned in Lemma [ 2 ] 
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z 1 = y- A)3* + 


't -1 


ot+l 


where for i £ sec(£), £ £ [L], 
V l(s) = \fnPt 


E 


j'Gsec(^) 


B° = 0 and compute 


fp-iwn, 

(34) 

V n ) 


for i £ [iV] 

(35) 

VnPe/ft 


eSj^/nPe/f? ' 

(36) 


The only difference from the earlier decoder described in ([8]»— 
m is that we now use the limiting values {f 2 } from Lemma 
l2| instead of {r t 2 }. The algorithm terminates after generating 
B t , where T* is defined in ( |33j ). The decoded codeword ft £ 
Bm,l{Pi, ■ ■ ■, Pl) is obtained by setting the maximum of f3 T 
in each section t to \fnPf and the remaining entries to 0. 

The section error rate of a decoder for a SPARC S is 
defined as 


£sec(S) := (37) 

^ l=\ 


Theorem 1. Fix any rate R < C, and a > 0. Consider a 
sequence of rate R SPARCs {<S n } indexed by block length n, 
with design matrix parameters L and M = L a determined ac¬ 
cording to and an exponentially decaying power allocation 
given by ( |28 [ >- Then the section error rate of the AMP decoder 
(described in (|34}-((36}, and run for T* steps) converges to 
zero almost surely, i.e., for any e > 0, 


lim P ( £ S ec(Sn ) < e, Vn > n 0 ) = 1. (38) 

no—^00 


Remarks: 

1) The probability measure in ( |38j ) is over the Gaussian de¬ 
sign matrix A, the Gaussian channel noise w, and the the 
message /3 distributed uniformly in Bm,l{Pi, ■ ■ ■ ,Pl )• 

2) As in |2j, we can construct a concatenated code with an 
inner SPARC of rate R and an outer Reed-Solomon (RS) 
code of rate (1 — 2e). If M is a prime power, a RS code 
defined over a finite field of order M defines a one-to- 
one mapping between a symbol of the RS codeword and 
a section of the SPARC. The concatenated code has rate 
i?(l — 2e), and decoding complexity that is polynomial in 
n. The decoded message /3 equals 3 whenever the section 
error rate of the SPARC is less than e. Thus for any e > 0, 
the theorem guarantees that the probability of message 
decoding error for a sequence of rate Il( 1 — 2e) SPARC- 
RS concatenated codes will tend to zero, i.e., lim P(Jj f 
/?) = 0. 

The proof of Theorem [T] is given in Section [V] 


A. Empirical Performance at Finite Blocklengths 

In this section, we make two modifications to the SPARC 
construction used in Theorem [T] to improve the empirical per¬ 
formance at finite block lengths. First, we introduce a power 
allocation that yields several orders of magnitude improvement 
in section error rate for rates R that are not very close to the 


capacity C. Second, we use a Hadamard design matrix (instead 
of Gaussian), which facilitates a decoder with 0(N log N) 
running time and a memory requirement of O(N). In com- 
parsion, with a Gaussian design matrix the running time and 
memory of the AMP decoder are both 0(nN). We mention 
that the recent work ||23) considers an AMP decoder with a 
spatially coupled Hadamard-based design matrix. In our case, 
the Hadamard design matrix is not spatially coupled, rather it 
is the modified power allocation that yields low section error 
rates. 


Modified Power Allocation: We define a power allocation 
characterized by two parameters a, /. For / £ [0,1], let 


Pl = 



2~ 2 aCf, 


i<£< fL 
fL+l<£<L 


(39) 


where 


K = 


p (tfaC/L _ ^ 


1 - 2 - 2aC f (1 - L( 1 - f)(2 2aC / L - 1))' 


The normalizing constant k ensures that the total power across 
sections is P. For intuition, first assume that / = 1. Then ( |39| ) 
implies that Pg oc 2~ a2Ci l L for l £ [L\. Setting a = 1 recovers 
the original power allocation of ( [28] ), while a = 0 allocates p 
to each section. Increasing a increases the power allocated to 
the initial sections which makes them more likely to decode 
correctly, which in turn helps by decreasing the effective noise 
variance f 2 in subsequent AMP iterations. However, if a is too 
large, the final sections may have too little power to decode 
correctly. 


Hence we want the parameter a to be large enough to 
ensure that the AMP gets started on the right track, but not 
much larger. This intuition can be made precise in the large 
system limit using Lemma [I] recall that for a section i to be 
correctly decoded in step (t + 1), the limit of LPi must exceed 
a threshold proportional to Rt 2 . For rates close to C, we need 
a to be close to 1 for the initial sections to cross this threshold 
and get decoding started correctly. On the other hand, for rates 
such as R = 0.6C, a = 1 allocates more power than necessary 
to the initial sections, leading to poor decoding performance 
in the final sections. 


In addition, we found that the section error rate can be 
further improved by flattening the power allocation in the final 
sections. For a given a, ( [39] ) has an exponential power allo¬ 
cation until section f L, and constant power for the remaining 
(1 — f)L sections. The allocation in ( [39] ) is continuous, i.e. 
each section in the flat part is allocated the same power as 
the final section in the exponential part. Flattening boosts the 
power given to the final sections compared to an exponentially 
decaying allocation. The two parameters (a, f) let us trade-off 
between the conflicting objectives of assigning enough power 
to the initial sections and ensuring that the final sections have 
enough power to be decoded correctly. 

The constants t 2 and x t : Analogous to LemmajTJ the large 
system limit values of the state evolution parameters for the 
power allocation in ( [39] ) can be obtained from Lemma [T] Set 
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Fig. 4. Section error rate vs R/C at snr = 15, C = 2 bits. The top 
solid curve shows the average section error rate of the AMP over 
1000 trials with Pi oc 2~ 2Cl ' L . The solid curve in the middle shows 
the section error rate using the power allocation in ( |39| ) with the 
(a, /) values shown. The SPARC parameters for both these curves 
are M = 512, L = 1024. The bottom solid curve shows the section 
error rate with the same (a, /) values, but L = M = 4096. In all 
cases, the dashed lines show the state evolution prediction ( |43| ) of the 
section error rate. Missing points at R = 0.6 C and 0.65C indicate no 
errors observed over 1000 trials. 

fg = a 2 + P, and for t > 0 compute 

- . r 1 , / aCP2 2aC f \ 

^ ~ mm i 2aC bS \Rf 2 [2 2aC f + (1 - f)2aC In 2 — 1] ) ’ 

l}, (40) 

l _ 2-2aCJ t 

Xt+1 = 1 + 2~ 2aC f ((1 — f)2aC In 2 — 1) ’ (41) 

f? +1 =a 2 + P(l-x t+1 ). (42) 

We note that setting a = / = 1 in ([40])—(|42]> recovers the 
limiting state evolution parameters for the exponential power 
allocation, which were obtained in Lemma [2] 

Experimental Results: Fig.[4]shows the performance of the 
AMP at different rates. Given the values of M,L, the block 
length n is determined by the rate R according to ([4]). For 
example, with M = 512, L = 1024, we have n = 7680 for 
R = 0.6C, and n = 5120 for R = 0.9 C. The solid curve at 
the top shows the average section error rate of the AMP (over 
1000 runs) with an exponentially decaying power allocation 
where Pf oc 2~ 2Cf / L , The solid curve in the middle shows the 
average section error rate with the power allocation in ( [39] ), 
with values of (a, /) obtained via a rough optimization around 
an initial guess of a = f = R/C. The solid curve at the bottom 
shows the average section error rate with L = M = 4096, and 
the power allocation in ( [39] ) with same (a, /) values as before. 

In all cases, the decoder described in ([34])—([36]) was used. 
The constants {f t 2 } required by the decoder are specified by 
Lemma [2] for the exponential allocation, and by (|40|)-(|42|) for 
the modified allocation. The simulations for Fig. [4] were run 
using Hadamard design matrices, which are described below. 
Across trials, we observed good concentration around the 


average section error rates. For example, with M = 512, L = 
1024 and R = 0.75C, 958 of the 1000 trials had zero errors, 
and the remaining 42 had only one section in error, for an 
average section error rate of 4.10 x 10~ 5 . Further, all the 
section errors were in the flat part of the power allocation, 
as expected. Increasing L tends to improve this concentration, 
while increasing M reduces the average section error rate. 
This improvement in the section error rate is illustrated by the 
bottom curve in Fig. [4] 

The dashed curves in Fig. [4] show the section error rate 
predictions for the two power allocations obtained from state 
evolution. Recall from Section |III-B that x t +\ in (|9]i can 
be interpreted as the expectation of the (power-weighted) 
fraction of correctly decoded sections after step t + 1. Using 
arguments similar to Proposition [T] we can show that under 
the assumption that the test statistic ~ f3 + f t Z, the non- 
weighted expectation of the correctly decoded sections after 
step (t + 1) is given by 


1 £ PP BBC 


nP ' Pe 

e=i 1 




£=1 


e 


(U(+^) 


(t/f+aS5) r-vM 

e 1 T ‘ +J 2 j= 2 e 


uf 


■= V t+ 1. 

(43) 


Thus vt * is an estimate of the section error rate. We observe 
that the empirical section error rate in Fig. [4] is close to the 
vt *, especially for the larger dictionary. 

It is evident that judicious power allocation can yield signif¬ 
icant improvements in section error rates. An interesting open 
question is to find good rules of thumb for the power allocation 
as a function of rate and snr. For any given allocation, one can 
determine whether the section error rate goes to zero in the 
large system limit. Indeed, using Lemma [l] with T 2 = cr 2 + P, 
we see that those sections £ for which the indicator in ( [27] ) is 
positive are decoded in the first step; this also gives the value 
of x\. Then with f 2 = a 2 + P(1 — x\) we can determine 
which sections are decoded in step 2, and so on. The section 
error rate goes to zero if and only if Xt* = 1. The proof of 
this is essentially identical to that of Theorem [T| 

Thus Lemma |T| gives a straightforward way to check 
whether a power allocation is good in the large system limit. 
This can provide some guidance for the finite length case, but 
the challenge is to choose between several power allocations 
for which xt * = 1. One way to compare these allocations 
may be via the state evolution prediction vt * from ( [43] ), but 
this needs additional investigation. 

Reducing the decoding complexity using Hadamard Dic¬ 
tionaries: The computational complexity of the decoder in 
([34])—([36]) is determined by the matrix-vector multiplications 
A/P and A*z f , whose running time is 0(nN ) if performed 
in the straightforward way. The remaining operations are 
O(N). As the number of iterations is finite, the decoding 
complexity scales linearly with the size of the design matrix. 
With a Gaussian design matrix, the memory requirement is 
also proportional to nN as the entire matrix has to be stored. 
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This is the major bottleneck in scaling the AMP decoder to 
work with large design matrices. 

To reduce the decoding complexity and the required mem¬ 
ory, we generate A from a Hadamard matrix as follows. Let 
N = ML be a power of 2, and let m = log 2 N. With Hq = 1, 
recursively define the 2 m x 2 m matrix H m as 


Urn = 


Hm — 1 Hm— 1 

H m — 1 Hm— 1 


extend z* £ 


to a vector z £ 


m* := -z*. 


6 U ,6* 1 \m°,..., wf 1 1 ,h L ,...,h t2 ,q' J ,...,q t2 , and 


,w. 


Lemma [4] iteratively computes the conditional distributions 
b t \y> tt and h t+1 \j? t+lt . Lemma [ 5 ] then uses this conditional 
distributions to show the convergence of various inner products 
involving L 4+1 , g 4 , 6 4 , and m 4 to deterministic constants. 

For t > 1, let 

||/? 4 " 2 ' 


We then have 


The design matrix A is generated by picking n rows 
uniformly at random from H m and scaling the resulting matrix 
by -4- so that each column has norm onej^j Thus the fcth 
element of the codeword is (A/3)k = Yhj£\N]AkjPj, where 
■A-kj £ { ^ } f° r ^ £ [ n ]! j £ [N] ■ 

For A generated as above, the matrix-vector multiplications 
A/ 3 4 and A* z f can be performed efficiently using the fast 
Walsh-Hadamard Transform (WHT) ]27) . Let S n denote the 
set of n indices of the rows of //,„ that constitute A. To 
compute AY, compute the length-A' WHT of If' and keep 
only the elements indexed by S n . To compute A*z l , hrst 

by embedding z* in 


A, :=x^lP- 


6* + A t PTb 1 = Aq l 


(45) 


(46) 


(49) 


the indices corresponding to S n , and setting the remaining 
entries to zero. Since H m is symmetric, the length-iV WHT 
of zf equals A*z*. 

The fast WHT has 0(N log N) running time. Further, we do 
not need to store A; only the vectors j3* and zf need to be kept 
in memory. Hence the running time and memory requirement 
of the decoder are now 0(N log N) and O(N), respectively. 
These substantial improvements allow the use of much larger 
dictionaries (e.g., M = L = 4096) for which AMP decoding 
with Gaussian matrices is infeasible with standard computing 
resources. For given values of n, M. L and power allocation 
{Pf}, we found the empirical performance with a Hadamard 
dictionary to be very similar to the Gaussian case. 

V. Proof of TheoremQ] 

The main ingredients in the proof of Theorem |T| are two 
technical lemmas (Lemma|4]and Lemma[5]i. We first lay down 
the notation that will be used in the proof. We then state the 
two lemmas and use them to prove Theorem [T] 

A. Definitions and Notation for the Proof 

For consistency and ease of comparison, we use notation 
similar to 0- Define the following column vectors recursively 
for t > 0, starting with (3° = 0 and z° = y. 

h t+1 := /3 0 — ( A * z f + P*), g‘^/3 4 - A>, 

b* :=w- z\ 


which follows from |6) and ( |44| . We also have 

h t+1 fq t = A*m 4 . (47) 

From ( |46| and ( |47| , we have the matrix equations 

X t = A* M t , Y t = AQ t , (48) 

where 

X t = [h 1 + q° | h 2 + q 1 | . • • | hf + q* 1 ], 

Y t = [b° | b 1 + A^ 0 | ... | 6 t_1 + \ t -innf~ 2 ], 

M t = [m° | ... | m*- 1 ], 

Qt = [q°\...\q t - 1 ). 

The notation [ci | C 2 | ... | c^] is used to denote a matrix with 
columns ci,... ,Ck- Additionally define the matrices 

B t :=[6°|...|6 t - 1 ], H t = [hf\...\hf), 

A t :=diag(A 0 ,...,A t _i) 

Note that M 0 ,Q 0 , B 0 , H 0 , and A 0 are all-zero vectors. Using 
the above we see that 

y t = B t +A t [0|M t _i] and X t = H t + Q t . (51) 

We use m*| and gjj to denote the projection of mf and q l 
onto the column space of M t and Q t , respectively. Let at := 
(«q, ..., a\-f) and f t ~ (7o> • • ■ > 7t-i) be the coefficient 
vectors of these projections, i.e.. 


(50) 


t -i 

m\ = ^a\m\ 
i =0 

-i t onrl /-it . 


t~ 1 




(52) 


»=o 


The projections of irf and <f onto the orthogonal complements 
of M l and Q f , respectively, are denoted by 

m^ := m‘ - m\ , q\_ := q l - q^ (53) 

With f t 2 and x t as defined in Lemma [ 2 ] for t > 0 define 


(44) 


CT t '-T? ~(T 2 =P(1-X t ), 

Let (ffd) 2 := CTq and (rd) 2 := f§, and for t > 0 define 


(54) 


Recall that /?o is the message vector chosen by the transmitter. 
Due to the symmetry of the code construction, we can assume 
that the non-zeros of /3q are in the first entry of each section. 
Define t , 2 to be the sigma-algebra generated by 


7-1 


) 2 :=df 1-^- , and (r^) 2 := r/ 1 - 


7-1 


(55) 


-Strictly speaking, we generate A by uniformly sampling from all rows of 
H m except the first. This is because the first row is all ones, while the others 
have an equal number of Is and —Is. 


Given two random vectors X , Y and a sigma-algebra zf, 
X\je = Y implies that the conditional distribution of X given 
ff equals the distribution of Y. For random variables X , Y, 
the notation X = Y means that X and Y are equal almost 
surely. We use the notation dt(n~ s ) to denote a vector in IR 4 






10 


such that each of its coordinates is o(n~ s ) (here t is fixed). 
The identity matrix is denoted by I. 

The notation Tim’ is used to denote the large system limit 
as n, M, L -A oc; recall that the three quantities are related 
as LlogM = nR , with M = L b . We keep in mind that 
(given R and b ) the block length n uniquely determines 
the dimensions of all the quantities in the system including 

A, (3o,w,h t+1 ,q t ,b t ,m t . Thus we have a sequence indexed 
by n of each of these random quantities, associated with the 
sequence of SPARCs { S n }. 

We next characterize (in Lemma [4]) the conditional distri¬ 
bution of the vectors h t+1 and // given the matrices in ( |49| ) 
as well as /? 0 and w. This shows that h t+1 and 6* can each be 
expressed as the sum of an i.i.d. Gaussian random vector and 
a deviation term. Lemma [5] then shows that these deviation 
terms are small, in the sense that their section-wise maximum 
absolute value and norm converge to 0 almost surely. Lemma 
[5] also provides convergence results for various inner products 
and functions involving {h t+1 , q*, 6*, m 4 }. These will be used 
to show that the performance of the AMP decoder in the large 
system limit is accurately predicted by the state evolution 
equations © and ( |3T| . In particular, it is shown that the 
squared error —1|—/3|| 2 converges almost surely to P(l—x t ), 
for 0 < t < T*. 

B. Conditional Distribution Lemma 

A key ingredient in the proof is the distribution of A 
conditioned on the sigma algebra .LZf _ t where t\ is either 
t + 1 or t. Observing that conditioning on LZ tl , t is equivalent 
to conditioning on the linear constraint^] AQ tl = Y tl and 
A*M t = X tl we have the following lemma. 

Lemma 3. ^7j Lemma 10, Lemma 12] For 0 < t < T*, 

the conditional distribution of the vectors in © and © 
satisfies the following, provided n > t and M t and Qt have 
full column rank. 

= X t (M t * M t ) “ 1 M t * m| 

+ Qt+i{Q*t+iQt+i)~ lY t+i m i + p Q t+1 A*m 1, 

Aq^^YtiQlQ^QUl + M t (M t * M t ) “ 1 A t * 

+ p M t Aq±, 

where mi, mjy , q ^, q^ are defined in ( |52| ) and ( |53| ). Here 

A, A = A are random matrices independent of LZt,t, 

and P = I — Pm, where Pm, = 
is the orthogonal projection matrix onto the column space 
of M t ; similarly, Pg t+1 = I - Pq t+1 , where Pg t+1 = 

Qt+i(Qt+iQt+i)~ 1 Qt + i- 

The distributional characterization of A*m t and Aq* in 
Lemma [3] together with ( |46[ > and ( [47] ) leads to the following 
lemma. 

Lemma 4 (Conditional Distribution Lemma). For the vectors 
h t+1 and 6* defined in ( |44| >, the following hold for 1 < t < T*, 

'While conditioning on the linear constraints, we emphasize that only A 
is treated as random. 


provided n > t and Mt and Qt have full column rank. 
^1^1,0= TqZq + A^o, 


h t+1 \y t+ltt — f—h* + Z t + A t+1|t , 


(56) 


b°U' O ±d 0 Z' 0 , b t \^^b t - 1 +diZ' t + A m . (57) 

a t-i 

where Z 0 , Z t £ and Z' 0 , Z' t £ ]R" are i.i.d. standard Gaus¬ 
sian random vectors that are independent of the corresponding 
conditioning sigma algebras. The deviation terms are 


Ai,o — 


-to I¬ 


-Pg° 


„ 0||2 


f ( b°)*m 0 _ ||g 1 

V n 


0||2 


(58) 


and for t > 0, 

A.,. = + (lU - i 

r—0 ' 

Ylklll -. 

A Vn ’ 

' ML M. \ - 1 


, Ikillp 

1 - Mt 


Z't 


Mt 




n 

T* 


Ht q± Mt 

n n 


t -1 


A t.m* 1 - ^2 K'y t r mr 1 


(59) 



t -2 


^2 > 

t+lit = j2<h r+1 

r—0 

V 


+ 

Yii m iii 

£ 

ll TO llln 

Lv Vn 

r 


Qt+i 


Qt+ i( QUl n t+1 ) 

(B* x mi Q* t+ 


-1 


i=0 


(60) 


Proof: We first demonstrate ( | 57 [ > . By ( |44[ > it follows that 

=-Ap 0 = A q °± ] ^-Z' 0 , 

\Jn 

where Z' 0 £ K" is an i.i.d. standard Gaussian random vector, 
independent of =5^o,o- The result follows since ||g°|| = ||/3 0 || = 
V nP = yj ndo ■ 

For the case t > 1, we use Lemma [3] to write 

&Vm= (M* - At?n t_1 )\^ t 

= Y t (Q* t Q t y x Q*q | + MtiMZMA^XZqi 

+ P M t Aq± - A t m t_1 

= B t {QtQt)~ 1 Qtq\\ + [mt-i]At(.Q*tQt)- l QU\ 

+ Mt(M£Mt) 1 H^qA + P M t Aq± ~ Ajto* 1 . 
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The last equality above is obtained using <HU- Noticing 
that = (I - P Mt )Aq± and B t(Qt <2t) _1 <2 t * Q\\ = 

YltZo 7 jb l , it follows that 



= o - p'LMqi 

i =0 

+ M t (M* Mf)“ 1 Hfq l ± - Atm 4 " 1 


= (I - pL+ £7 tv + M t {M* t M t )- l H* t q^ 


+ [0|M i _i]A £ (QJ : Qt) QtQ\\ ~ ^t m 


t-i 


(61) 


that these deviation terms are o(n 5 ) for some <5 > 0. 


C. Main Convergence Lemma 

Definition 1. A function fi : R m — > R is pseudo-Lipschitz of 
order k (denoted by (j) £ PL(k)) if there exists a constant 
C > 0 such that for all x,y £ R m , 

\fa x ) fay )| < 67(1 + + ||y|| fc - 1 )||* - y||. (63) 

In the lemma below, 6 £ (0, is a generic positive 
number whose exact value is not required. The value of <5 
in each statement of the lemma may be different. We will say 
that a sequence x n converges to a constant c at rate n~ s if 

lim^oo n s (x n - c) = 0. 


where Z[ £ R" is an i.i.d. standard Gaussian random vector. Lemma 5. The following statements hold for 0 < t < T*, 
All the quantities in the RHS of ( j6T| except Z[ are in the 
conditioning sigma-field. We can rewrite (|6T) as 


2 C 


+d]-Z' t + A t , u 

a t -1 


where 


A t , t = 5>^ r +K_i- 


r—0 


A 


Ut -1 


+ 


kil 


— a 


\ — l/o* a 


M t 


Z't 


+ [0|M t _!]A t {Q* t Q t )-^QU\+M t {M;M t )-^H* t qf 
- A 

The above definition of A t ,t equals that given in (|59|) since 


M;M t \~ x M* 
n 


Mt ( ^ I W" 1 - £ K +ll l +1 m l 

K i=0 / 

+ [0|M t _ 1 ]A t (QjQ t ) _1 Q*^ - A tm 1 - 1 

t -2 t -2 

= A t TO t_1 - ^2 A,+i 7 * +1 m' + ^2 ^j+ilj+i'm 3 - A t m 4_1 


i=o 


3=0 


= 0 . 


h t+1 1 


d 1 1 

y 3 t +i,t = ^2 n “h k 




A 


t+l,t 


'£-1 




Tt-1 


t-1 


- Tj-'-Zt + A t + A t+ljt = r t Z t . 


(62) 


To obtain the last equality, we combine the independent 
Gaussians Z t _\ and Z t using the expression for ffi in ( |55| ). It 
can be similarly seen that 6* is the sum of an i.i.d. A/(0, of) 
random vector and a deviation term. The next lemma shows 


where T* = 

log(C/.R) 

(a) The following statements hold almost surely: 

max | [A t _|_i )t ]j | = o (n~ d falogM) , 

j£sec(£) V / 

max \h t+1 \ < c t+ i^/log M for £ £ [L], 


j£sec(£) 


lim 


= 0 , 


(64) 


(65) 


where Ct+i > 0 is a constant not depending on N , n. The 
convergence rate in ( |65| > is n~ 6 . 

(b) i) Consider the following functions defined on R m xR m x 


This completes the proof of ( |57] >. Result ( |56| ) can be shown 
similarly. ■ 

The conditional distribution representation in Lemma [4] 
implies that for each t > 0, h t+1 is the sum of an i.i.d. 
7/(0. ffi) random vector plus a deviation term. Indeed, if we 
assume that h l has the representation f t _\Z t _i + A t , then 
Lemma [4] implies 


«M 


. For x,y,z £ R M , — 1 < r < s < t, and 


£ £ [L\, let 

<t>iA x iVi z ) 
faA x iV’ z ) 
fa A x iVi z ) 
faA x iV’ z ) 


= X *y/M , 

= h r Az-x)\\ 2 /\o g M, 

= Wt ( z ~x)~ A*WA Z -y)-z\/ log m, 

= y*WA z - x ) - Z \A°& M i 

( 66 ) 


where for r > 0, fifa-) ^ le restriction of rf to section 

£, i.e., for x £ R M , 


VeA x ) := V np e 


exp 


( xiy/nPe \ 

) 


ET=i exp(Af^) 




(Also, r\ t *(■) := 0 for i £ [M\.) Then, for k £ {1, 2, 3,4} 
and arbitrary constants (a o,... ,ctt, bo, ■ ■ ■, b t ), we have 


lim n s 


'52<f>k/(252 a r h e +1 ’'52 b s h e +1 > fae) -°k 

i—\ r—0 s—0 

(67) 

almost surely equals 0, where 


Ck 


:= lim - J2 E 


e=i 


E CL r T r Z r£ ^ E Zs 


\r —0 


s=0 


Here Zo,...,Zt are length-N Gaussian random vectors 
independent of f3, and Z re , fin, fay, hf +l denote the £th 
section of the respective vectors. For 0 < s < t, 
{Z Sj }je[N] are i-i.d. ~ A/(0,1), and for each i £ [A], 
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(Zo i , ..., Z ti ) are jointly Gaussian with K[f r Z ri f t Z ti ] = 
for 0 < r < t. The limit defining Ck exists and is finite 
for each fi^^ in ( |66| ). 

ii) For all pseudo-Lipschitz functions fib '■ R t+2 -)I»/ 
order two, we have 



n 


a.s. 


rtf, 


(80) 


where of , rf, defined in ( |55| ), are strictly positive for 
t < T*. The convergence rate in both m and is 
n~ s . 


lim n° 


-^2fib(bi,-,bl,Wi) 

In 


i— 1 


- E[fi b (ooZ 0 , ...,o t Z t , aZ w )\ 


= 0 a.s. 

( 68 ) 


The random variables (Z 0 ,..., Z t ) are jointly Gaussian 
with Z s ~ A/”(0,1 ) for 0 < s <t and E [o s Z s dtZ t ] = of 
Further, (Zq, ..., Z t ) are independent of Z w ~ A/”(0,1). 


(c) 


n 

lim ’ZEE 0. 

n 

The convergence rate in both ( |69| ) and i, 


is n 


-8 


(d) For all 0 < r < t, 


(ftr+1 y h t+i as 

lim - N - =TtI 

r W..,- 2 

hm- = Ot , 


(69) 

(70) 

(71) 

(72) 


where o s is defined in a- The convergence rate in both 

CD and mi¬ 


ls n 


(e) For all 0 < r < t. 


lim 


lim 


( q°)*q t+1 a.s. 


= o t+1 , lim 


(q r+1 )*q t+1 as 


't+ 1) 


(m r )*m t 


a.s. _2 


(73) 

(74) 


The convergence rate in both CD and CD ir ^ s 
(f) For all 0 < r, 5 < t, 

Km ihlTTsFl lim Ar+1 lim 


is n 


-2 -2 

a.s. Cr i 1 +l^"max(r,s) 


lim 


(Ib r )*m s 


a.s _2 


K.(r,s ) ‘ 


(75) 

(76) 


The convergence rate in both CD and CD is n s . 


(g) The vectors ( 7 q +1 , ..., 7^ +1 ) and (ofa,... con¬ 

verge entry-wise to the following limits at rate n~ s . 

lim( 7 Q +1 ,..., rtlrtt +1 ) =‘ (o,..., 0, °|f^ , (77) 

lim(ag,..., a\_ 2 , rti) ^0,..., 0, , t> 1. 

(78) 


(h) 


^+l||2 


1- lk_L II a s - L \2 
l lm „- = (0-i+l) i 


(79) 


The lemma is proved in Section VI 


D. Comments on Lemmas 0 and [j] 


0 


To prove Theorem ITl the main result we need from Lemma 

1—1 ll„i||2 II II2 


5 is that for each t > 0, 


W~M\ 2 


converges to erf 


with probability 1. This result is used in Section V-E below to 
prove Theorem 1 The convergence of is shown in part 
(e) of Lemma pjby appealing to part (b).i, which shows that 
within the functions listed in ( |66| , h t+1 = /3 0 — (A*z* + ft*) 
(the difference between the tme signal and the test statistic) 
can be replaced by f t Z t in the large system limit. 

While the results in Lemmas |4] and 0 are similar to those 
found in [;7] Lemma 1], there are a few key differences. 


• The functions listed in ( |66| all act section-wise on the 
vectors {TjT} t>o- Recall that the structure of /3 q implies 
that h l e R ML are section-wise independent, where 
the section size M = L 3 = @((n/ log n) 3 ). This is in 
contrast to the functions considered in 0’ 0 (and in 
part (b).ii), which act component-wise on vectors whose 
components are i.i.d. 

• To prove part (b).i of Lemma [5] for the section-wise 
functions in ( |66| ), we first need to show that the deviation 
terms A t+ i it (defined in Lemma |4| can be neglected in 
the large system limit. This is done by showing in part 
(a) of Lemma [5] (see ( [64] )) that 


ma *-J[ A t+i,t]j\ 

jGsec(-t) 


= o(n s s/logM^ . 


To prove this, we require the inner product convergence 
results given the other parts of the lemma to hold with a 
convergence rate of n '* for some S > 0. This is another 
difference from 0 Lemma 1], where a minimum rate of 
convergence was not needed. In our case, without an n~ 5 
convergence rate, we would only have that the deviation 
terms satisfied maxj ggec (^)|| = o(\/\ogM), and 
we would not be able to neglect them. 

• Other differences between Lemmas |4|5| and 0 Lemma 
1] include: 

- Lemma [4] characterizes the the conditional distribution 
of the vectors h t+1 and 6*, given the matrices in ( |49| ) as 
well as /?o and w, as the sum of an ideal distribution 
and a deviation term. Lemma [4] should be compared 
to 0 Lemma 1(a)], which is a similar distributional 
characterization of h t+1 and //, however it does not 
use the ideal distribution. We found that working with 
the ideal distribution throughout Lemma [5] simplified 
our proof. 

- Lemma 0 gives explicit values for the deterministic 
limits in parts (c)-(h), which are required in other parts 
of our proof. 
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E. Proof of Theorem [7] 

From the definition in ( |37j ), the event that the section error 
rate is larger than e can be written as 

{£ sec (S„) > e} = 10 e f p 0e } > Lej . (81) 

When a section £ is decoded in error, the correct non-zero 
entry has no more than half the total mass of section £ at 
the termination step T*. That is, /3 2 ent(^) — \VnPt where 
sent(f) is the index of the non-zero entry in section t of the 
true message /3 q. Since /3 0sent (g) = y/nPe, we therefore have 


(a) 


Au = '-XfiZ,, 


and 


7* d M 7 
A v = — 7 =Z V , 
Vn 


( 88 ) 


where Z u £ R n and Z v £ R w are each i.i.d. standard 
Gaussian random vectors. Consequently, 


lim 

n—foo 


||iu|| 2 


“=!■ lim M! £ — =‘ 


n—¥ oo 


n 


i= 1 


n 


n—f oo 


(89) 


lim 


Ill'll 2 

N 


a.s. 


lim 

n—f oo 



3 =1 



a.s. 


lim 

n—f oo 



(90) 


10l¥>M => WPf -M 2 > r ^, i€[L]. (82) 

Hence when ( f8l] > holds, we have 

Il/? T * -Po\\ 2 = y\\Pf ~Po e f > 10* ^ Po e }~f~ 

1=1 t =1 

( b ) uPl neo 2 ln(l + snr) 

> Le—± > -1,- 

4 4 

(83) 

where (a) follows from ( |82| >; ( b ) is obtained using ( |8 1 [ >, and 
the fact that Ij > Pl for i £ \L — 1] for the exponentially 
decaying power allocation in ( |28| ; (c) is obtained using the 
first-order Taylor series lower bound LPl > <r 2 ln(l + =/). 
We therefore conclude that 


{£ aec (<Sn) > £} 




> 


to 2 ln(l + snr) 


(84) 


Now, from ( |73| > of Lemma [TJe), we know that 

lim MfRAE = lim SO! v P (i - iT .) o, (85 , 

n n 

where (a) follows from Lemma[2] which implies that £t*-i = 
, and hence Xt * = 1. Thus we have 

converges almost surely to zero. 


1 for T* = 


2 C 


log(C/R) 

shown in ( [85] ) that —— 
i.e., 


lim P 

riQ—f oo 


\\P T -Pot 


< e, Vn > no ) = 1 


( 86 ) 


for any e > 0. From (|84|, this implies that for e' = 


_4e_ 

a 2 ln(l+snr) ’ 


lim P(£ sec (S n ) < e', Vn > n 0 ) = 1. (87) 

n q—^oo 


VI. Proof of Lemma [5] 

A. Useful Probability and Linear Algebra Results 

We list some results that will be used in the proof of Lemma 
0 Most of these can be found in (7J Section III.G], but we 
summarize them here for completeness. 

Fact 1. Let u £ R w and v £ R” be deterministic vectors such 
that limn^oollwlp/n and lim n _ >00 ||t;|| 2 /n both exist and are 
finite. Let A £ R raxJV be a matrix with independentN(fit, 1 /n) 
entries. Then: 


(b) Let W be a d-dimensional subspace of R” for d < n. Let 
(w i,... ,Wd ) be an orthogonal basis ofW with ||wj|| 2 = n for 
i £ [d], and let Pyy denote the orthogonal projection operator 
onto W. Then for D = \wi | ... | Wd), we have PyyAu = 
^f^P\yZ u = where x £ R d is a random vector with 

y/U VV U y/Tl 

i.i.d. A/"(0,1/n) entries. Therefore limn^oo n 5 ||a;|| a = 0 for 
any constant S £ [0,0.5). (The limit is taken with d fixed.) 


Fact 2 (Strong Law for Triangular Arrays). Let {X ni : i £ 
[n],n > 1} be a triangular array of random variables such 
that for each n (X ni i, ..., X n ^ n ) are mutually independent, 
have zero mean, and satisfy 

1 " 

— Y E|2f„ i\ 2+K < crp/ 2 for some n £ (0,1) and c < oo. 
n 

i—1 

(91) 

Then / X n/i —> 0 almost surely as n —» oo. 

Fact 3. Let v £ R” be a random vector with i.i.d. entries ~ pv 
where the measure py has bounded second moment. Then for 
any function ip that is pseudo-Lipschitz of order two: 


lim -H a =^wiV’OO] 

l —VOO 7) Z. - J 


n—>oo n ■ 

with convergence rate n~ 6 , for some 6 £ (0,1/4). 


(92) 


i=i 

,-s 


Fact 4 (Stein’s lemma). For zero-mean jointly Gaussian 
random variables Z \, Zi, and any function / : R —> R 
for which £[^ 1 /(^ 2 )] and K[f l (Z 2 )) both exist, we have 
E [Z 1 f(Z 2 )\ = E[Z 1 Z 2 )E[f(Z 2 )}. 


Fact 5. Let v\,... ,Vt be a sequence of vectors in R" such 
that for i £ [t] 


1 

n 


Pi-iK)H 2 


> c, 


where c is a positive constant and Pj_i is the orthogonal 
projection onto the span of V \,..., v, t _ 1 . Then the matrix C £ 
R txt with Cij = v*Vj/n has minimum eigenvalue A m i n > d, 
where d is a strictly positive constant (depending only on c 
and t). 


Fact 6. Let be a sequence of t x t matrices such 

that lirrv^oc S n = S/o where the limit is element-wise. Then 
liminfn-^oo A m i n (5' n ) > c for a positive constant c, then 
Amin (£00 ) d C. 

Fact 7. Let Z-\, Z->,... be i.i.d. standard Gaussian random 
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variables. For any constant K > 1, with probability 1 we 
have 

max \ZA < \J2K log M for all sufficiently large M. 

je[M] 


Proof: For x > 0, we have P(max j£ [ M ] Zj > x) = 

1 — {P{Zi < x)) M = 1 — (1 — Q(x)) M , where Q(x) = 

f£° ^=e _ “ 2 ^ 2 dii. Using Q(x) < e~ x ! 2 for x > 0 and 
setting x = \/2K In M, we obtain 

/ i \ M ^ 

P( max Zj > V2K\nM) <1—11— I < , 

j£[M] 1 \ M J M K ~ l 

where we have used (1 — y) M > (1 — My) for y £ (0,1). 

Hence for K > 1, we have 


E P( max Zi 

je [Ml J 
M =1 J 1 J 


t_XJ H 

> s/2K In M) < Y < °°- 

M —1 


Therefore the Borel-Cantelli lemma implies that with prob¬ 
ability 1, the event {iriax )6 [ M ] > V2 K In M} occurs 

only for finitely many M. By a symmetrical argument, we 
can show that with probability 1, the event {rriin ;e [ M ] Zj < 
— \/2A'lnM} also occurs only for finitely many M. ■ 


B. Proof of Lemma [5] 


The proof proceeds by induction on t. We label as PL t+1 the 
results ( 164) , ( |67) , (691, ( |7T) , {73), {75) , {77), {79) and similarly 
as B 1 the results (651, (j68ji, ((70), ((72), (|74) (76), ((78), <[80). 
The proof consists of four steps: 


1) B a holds. 

2) Ph holds. 

3) If B r ,PL s holds for all r <t and s <t, then B f holds. 

4) if B ri Pl s holds for all r <t and s < t, then PLt+\ holds. 


1) Step 1: Showing Bo holds: We wish to show that ( [65) , 
((68), <(70). ((72), ((74), ((75). {78), and ((80) hold when t = 0. 

(a) Ao o = 0 so there is nothing to prove. 

(b) From Lemma [ 4 ] we note b° = faZ where Z £ W' is 
a standard Gaussian vector. We will first use Fact |2] to show 
that 


limn 5 


1 x > 

-Ytib^oZiiWi) 

n z ' 

i= 1 

1 n 

- Y E z {<j> b (d 0 Zi, u;,;)} 

i—1 


= 0 


(93) 


Let Z be an independent copy of Z. To apply Fact [ 2 ] we need 
to verify that 


1 

- Y ^\n 5 fa(d 0 Z z , Wi)-n s E z {fa(d 0 Zi, wf)} \ 2+K < cn K/2 . 
n z ' 

»=1 


for some constants k £ (0,1) and c > 0. Dropping the 


subscript i on Z. Z for brevity, we have 

Ez\cj) b (doZ, Wi) — E z {fafaoZ, Wi )}| 2+K 

( a ) | ~ |2 -\-K 

< E^ iZ \fa{d 0 Z, Wi) — <j>b(doZ, Wi) 

< c'|a 0 | 2+K E J z ||Z - Z\ 2+k (l + \* 0 Z\ + |w«| + \doZ\j 2+l ' J 

<c 0 \d 0 \ 2+K \e.z,z{\Z-Z\ 2+k (l + |a 0 Z| 2+K + |a 0 Z| 2+K )} 
+\wi\ 2+K E^ z [\Z - Z\ 2+K }] 

< Cl + C 2 |Wi| 2+K , 

(94) 

where c', Co, c 1 , o> are positive constants. In the chain above, 
(a) uses Jensen’s inequality, ( b ) holds because fa £ PL{ 2), 
and (c) uses the fact that Z, Z are i.i.d. A/”(0,1). Using <(94), 
we obtain 

1 " 

- Y E \n 5 fa(d 0 Z, Wi) - n s E z {fa{d 0 Z , wf)} | 2+K 

cnK/2 ^ 

n z ' 

i= 1 

for 5 < Yh s i nce the wf s are i.i.d. A/”(0,er 2 ). Thus ( |93) 
holds. 

Finally considering the expectation in ( (93) , Fact [3] implies 

- y^E z {fa(d 0 Z, Wj)} n ^ > E^fa(d 0 Z 0 ,aZ w )^ a.s., 

2 = 1 

(95) 

at rate n Combining ( (93) and ( (95) yields the result. 

(c) The function fa(b®,Wi) := b°Wi £ PL{2). By Ba(b), 
lim (b \ w a = E{doZocrZ w )} = 0 and the convergence rate 
is n~ s . 

(d) The function fafa^wf) := (&°) 2 £ PL{2). By B 0 (b ), 
lim a 4 |- E{(ifo^o) 2 } = CTq and the convergence rate is 
n~ 5 . 

(e) Recall m° = b° — w. The function falb^wf) := 
(6° - Wi) 2 £ PL( 2). By B 0 (bf lim^ = s ' E{(d 0 Z 0 - 
aZ w ) 2 } = CTq + cr 2 = fg and the convergence rate is nT s . 

(f) The function fa(b°,Wi) := b°(b° — wf) £ PL( 2). By 
Boib), lim (b \ m E{<T 0 ^o(d’o-^o - crZ w )} = d% and the 
convergence rate is n~ s . 

(g) For t = 0, nothing to prove. 

(h) Since Mq is the empty matrix, m j_ = mo, so the result 
is already shown in £>o(e)- 


2) Step 2: Showing PLi holds: We wish to show that ( |64) , 
<(67). ((69). (J7T), ((73), ((75), ((77), and ((79) hold when t = 0. 
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(a) From the definition of A 1.0 in Lemma [4| {58), we have 

^ 1,0 = 


m 


+ q 


m 


- r 0 

h °\\ 2 

n 

- T 0 


I - 

-1 

Zo- 

p 


Ira 


PqO 


n 

(6°)*m 0 

n 

m°|| q° 


-,0112 




\fn \J~P yjn 


q° /( b°)*m° 

+ ~P ^ rT~ 

where the second equality follows from Fact [T] with Z £ 
A/(0.1). It follows from ([96]) that 


. ma x l[ Al ,o]il < 

j(zsec(z) 


ra 


- T 0 


max | Zq . 

jEsec(C) 3 


+ 


||m°|| 

1 nPg \Z\ y/nPi 

(b°)*m° „ 

y/n \ 

P P 

n 


The second step of the proof appeals to Fact [2] to show that 


lim n 


1 L 

T ^ k ( a oT 0 Z 0e , b 0 f 0 Z 0e , (3 0e 


e=i 

L 


(96) 


— y {c i)k,i (aoT 0 Zo f , b 0 T 0 Z 0e , Po e )j 


=' 0. 


Let Z Q be an independent copy of Z 0 . In order to use Fact 
[2] to get the above result we must prove the following for 
each function in {66) , for some 0 < k < 1, and c > 0 some 
constant. 


n5 («or 0 ^ 0f , b 0 T 0 Z 0e , p 0 ^j 


(98) 


(97) 


—n s <fik,e (aoT 0 Z 0e , b Q T 0 Zo e , Po e ) 


I 2-[-AC 


< cL 


k/2 


We show all terms on the RHS of the above are 
o(n~ s ^/ log M ) almost surely. Recall \fnPn = 0(>/log M). 
By Bo(e), ||m°|| 2 /n °-+' Tq at rate n ~ s . This along with the 
Fact [ 7 ] implies that the first term is o(n~ S \J log M) almost 
surely. Similarly from Bo(e) and the fact that \Z\/^/n is 
almost surely o(n ~ s ) the second term is o(n ~ s %/log M); 
finally by £>o(f) the third term is also o(nT s \J\ogM) almost 
surely. We have therefore shown that max j6sec (£) | [A 10 ]j | = 
o (n -<5 -v/log M) . 


Note that the exact condition required by Fact [2] follows from 
{98) by an application of Jensen’s inequality. In (28) , it is 
shown that for each function in ([66]) and each t £ [L], 

Z 0 ,Z 0 bof 0 Z 0t , Po^j 

(a 0 f 0 Z 0e , bof 0 Z 0e , p 0t )\ 2+K =■ 0((log Af) 2+ft ). 

(99) 


E 


Bound 


holds if 5(2 + k ) is chosen to be 
L = 0(n/logn)). 

The final step of the proof is to show that 


implies 

1 
2 ' 


smaller than (Recall L = 0(n/logn)). 


Next, from Lemma |4| {56) it follows, 

max \h]\ < |f 0 | max |Z 0 .]+ max |Ai i0 | 

j£sec(l) j£sec(£) j£sec(£) 3 

< |f 0 | (V 3 log M) + o(?r _< Vlog M), 

where we have used Fact [7] for the second inequality. This 
completes the proof. 


lim n 5 


1 

L 


L 

[( bk,i (aoToZo e ,b 0 f 0 Z 0e , Po e )\ 


1 

L 


L 


5Z E (^o,/3) 


pk,i (aoToZ 0e ,boT 0 Z 0e , p£ 


= 0 


But the above holds because the uniform distribution of the 
non-zero entry in Pe over the M possible locations and the 
i.i.d. distribution of Zo (and of Zq) together ensure that V/?o £ 
Bm,l, we have 


(b) The proof of this part involves several claims which are 
fairly straightforward but tedious to verify, so we give only the 
main steps, referring the reader to (28) for details. Throughout 
we use generic 4>k,e(x,y, z) since the steps are identical for 
all k £ {1,2,3,4}. From Lemma |4| {56), 

( bk/( a oh\ , boh}, Po t )Wi,o 

= 4>k,e. (O'OToZoi + oo[^i,o]^! bofoZo e + 6 0 [A lj0 ]^, Po e ) ■ 


E-Zo [0M (ao T oZo e ,b 0 T Q Z 0e , Po e )\ 


= E 


(Z O ,0) 


<t>k,e (aoToZ 0e ,b 0 T 0 Z 0l , P^j , V i £ [L\. 


( 100 ) 


The existence of the limit of 
zY^=i^{z 0 ,p)[$kA a oToZo t ,bofoZo t ,Pij\ for k = 1 

follows from the law of large numbers; for k = 2, 3,4, the 
limit follows from Appendix [P] 


By 7fi(a), max jesecW |[A li0 ] 3 -| “=' o(n s 'y/logM) for each 
t £ [L] and some 5' > 0. In |28|, the first step of the proof 
uses this to show for each of the functions in {66), 

1 L 

I Pk/ (a 0 ToZ 0e + a 0 [A lj0 ]^, b 0 ToZ 0l , + &o[Ai,o]i> Po t ) 

n e=i 

-Pk/ (a 0 f 0 Z 0e , b 0 f 0 Z 0e , p 0e )\ a = o(n~ y log M). 

Choosing 5 £ (0, 6') ensures that we can drop the deviation 
term Ai.q. 


(c) Using the fourth function in {66) with r = — 1, 
lim h a = lim — ^-E{foZg/3} = 0 by T~i\(b) and the 
convergence rate is n~ d . 

(d) Using the first function in {66) . lim a = 

lim jj- E||Zo|| 2 = Tq by TLi(b) and the convergence rate is 
n~ 5 . 

(e) Using the third function in {66), by 'H \ (b) we have for 
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r = 0 or r = 1: 

n 

°= lim ■^-E[(r] r ~ 1 (P - f r _ 1 Z r _ 1 ) - ftV(/? - f 0 Z 0 ) - p)\ 

= *?, " 

and the convergence rate is n~ s . The last equality above is 
shown in Appendix [D] 


(f) Using the fourth function in 
we have 

1 


with r = 0, by Hi (b) 


lim~ 5 ' 


n- I -(h'Yq 1 - 
n 


- '^2E{T 0 Zo e [ri^(P -t 0 Z 0 ) - Pe]}J =0 a.s. 

( 101 ) 


i= t 


Consider a single term in the expectation in ( |101| i, say i — 1. 
We have 


E{f 0 Z* 0 Jr, 0 {1) (l3-f 0 Z 0 )-(3 w }} 


M 


( 102 ) 


(i) 


= r 0 ^ E{Z 0i [r]°(p - f Q Z 0 ) - ft]} 

i—1 

where /3 (1) = (ft, ft, • • •, Pm) and ft 

(Z 0l ,Z 02 ,..., Z 0m ). Note that for each i, the function 
rf- (■) depends on all the M indices in the section containing 
i. For each i £ [M], we evaluate the expectation on the RHS 
of ( |102| ) using the law of iterated expectations: 

E{Z 0i [ V °(p-foZ 0 )-pi}} 

( „ „ -|i (103) 

;{ft,[77 0i (J3-f 0 Z 0 ) -ft] 1)3(1),2 0(ini } 


= E 


E< 


where the inner expectation is over Z (li conditioned on 
{^(i),^ 0(1)v }. Since ft. is independent of {/3 (1) , ft (1)V .}, 
the latter just act as constants in the inner expectation over 
Z 0i ~ Aft),1). Applying Stein’s lemma (Fact |4j) to the inner 
expectation, we obtain 


E 
= E 


E {ft, [rgtf - T 0 ft) - ft] | ft 1} , Z 0(1) \<}] 

E <! -^-K°(/3 - r 0 ft) - ft] | fti), ft (1) \* 


( ^-^E 

ft 


dZ 0 , 

E{»7?G8-7=o^o) 

nPi - Vi(P - t 0 Z 0 ) 




( b ) 


1 


Vi (P — ftft) (- Vi {P - ftft 


=-E 

ft 

where (a) follows from the definition of rj\ in © which 
implies dv s i } s ' > = [\JnP^ — vl( s )) l° r i € sec(f), and 
( b ) from the law of iterated expectation. Using the above in 


( |103[ > and ( |102| i, we have 


E 


f 0 z* 0 {i y {1) (p - ? 0 z 0 ) - p (1) ] 


M 

Y,^[vi(P-f 0 z 0 )^^-f 0 z 0 ) 


(104) 


The argument above can be repeated for each section £ £ [L\ 
to obtain a relation analogous to ( |104| ). Using this for the 
expectation in (|101|), we obtain 


lim —(h 1 )*q 1 °=' lim (—E 
n \n 

= -ft 2 , 


\v°(P ~ T oZo)\\* 


- P 


with convergence rate n 5 . The last equality above follows 
from Appendix [P] 

Finally, recall from ft(e) that ||m 0 || 2 /n 14' t 2 at rate n~ s . 
Further, from (|45|, we observe that 


ft “ To 


II/? 1 


- P 


a.s. i. J- 

—> lim 37T 

ft 


E 


\V°{P - ftft))|| S 


L -P = 


~<A. 


where the convergence at rate n s follows from H i(b) applied 
to the second function in 


(g) Note that Q\Qi is invertible since Q\Qi = ||g°|| = 
nP > 0 


l i Q\Qi 

7o = 


_1 


QW (ft)V a.s. ai 


n / n nP 

where the limit follows from Hi(e). 


P ft 2 ’ 


(h) Let Pqj = Qi(Q*Qi)~ 1 Qi be the projection matrix 
onto the column space of Qi = q°. Note that (ft (ft is 
invertible since Q\Q i = nP > 0. Then, 

Ikill 2 


= II? 1 - Pqa 1 


|<?T (q'yq 0 /(g 0 )*g 0 \ Wl (g 0 )* g 1 


(105) 


Using the representation in ( |105| l, it follows by 7ft (e) that 

Ikll| 2 /n“4- d 2 - {di/d 2 ) = (d 2 -) 2 . 

Finally note that tf 2 = a 2 {(1 + snr) 1_ ^ r_1 — l{ with 
defined in ( |32| >. The definition of £ r _! implies that (ct^ - ) 2 is 
strictly positive for r <T*. where T* = 


2 C 


l°g(C/it) 


3) Step 3: Showing B t holds: We wish to show that 
( |68j ), ( f7(T| , ( |72| , ( [74] ). ( |76l l, ( f78| ), and ([80]) hold assuming 
B ri H s holds for all r < / and s < t. 


M t 


r \ t . hi M* M t 

(a) Let M, := 

A t.m*- 1 - A r 7 t r m r ~ 


a: 


t i± 


and v := 

. From the definition of 
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A t ,t in Lemma [4] ([59), we have 


A*,= 


t -2 

r =0 


7*& r + 7l-i- 


'*-1 



\hu i] 

a.s. t 

—> lim 

/ V Vn 4 ) 1 

n 



7^(g) (convergence of 7 * to finite values). Therefore, 

1 t —2 


A, 




7r+l Ar+1 




r—0 


aAi ^ 




+ M t M t “V 


s—0 


I A* 


n 


< (3* + 1 ) 


Z(-t ) 1 


r—0 


,^ yr- 1 

+ 7t-i - - 

°7-l/ n 


kii 

\fn 


-_L 


\Z ' t \\ 2 


kill 2 Ill'll 2 


r—0 


E 


[M t u] r +l — 


Ar+l7r+l+ 


(W)" 


H, 


tl± 


for 0 < r < t — 2 , 


—A* + 


(™) 


-1 


H* t q 


- r+1 

, for r = t — 1 . 




n 


a. s. 1 . 

—>• lim 


(/i‘)*gi _ (tf)V 




n 

t-i 


E 


7r 


: (Ev 


r—0 


-E^ A - 


r—0 


(to 1 1 )*to’" 1 
n 


0 09) 

at rate n r) . Using ( |109| ) in ( | 108| > we see that each coefficient 
of (|108|) is o(n ~ s ), which completes the proof. 


(106) 


where we have used Fact |T] to write 

Iklll p = ‘.,i 

/— ■ Mt ^t /— / v ^ /—• 

V n n V ^ ^ V™ 

The matrix M t = [m°|... to*] S K nx * forms an orthogonal 
basis for the column space of M t such that ||to s || = \fn, Vs, 
and Z’ t £ K* is an independent i.i.d. A/(0,1) random vector. 
Using = XEo m? and ||m s || 2 = n in 

( ]106| >, we obtain the bound 

rt - 2 , ik r ii 2 


(b) Using the characterization for b* obtained in Lemma [7] 
( |57j ), we have 

<^t,t 

= 4>b (b°, ..., bl~\ b ‘- 1 + ai Z' ti + [A t)t ] i , Wi j . 

The deviation term A t t in the RHS of the above can be 
dropped. Indeed, defining 


Oi= ($,■■■,% 


-2 

ut-l a t it-1 


bi + a t Z'ti + , Wi , 




u t -i 
-2 

.0 Ut -1 a t it -1 , -_L ry, 


i ’ ^-2 


+ a t Z t ,Wi I , 


t -i 


we can show that almost surely 


1 


(107) 

We show that each term on the RHS of ( |107| ) is almost surely 
o(n~ s ). Note that by Ht(g), 7* ^4 0 for 0 < j < t — 2 

and 7 |_! ^ 4y. By B 0 (d) - B t -i(d), ||& r || 2 /n ^4' a 2 for 
0 < r < t — 1. These imply that the first and second terms 
in ( |107| ) are o(n~ s ) almost surely. By 77f(h), ||<ZjJ| 2 /n -4 
(er^) 2 ; noting that ||Z t '|| 2 and ||^|| 2 are \t random variables, 
it follows that the third and fourth terms are o(n~ s ) almost 
surely. Finally, by So(e) - S t _i(e), ||m T '|| 2 /n ^4' r 2 for 0 < 
r < t — 1. Therefore to prove convergence for the fifth term, 
we will show that (|M f r 1 'i;] T . +1 ) "4' 0 at rate n~ s . Note that 


E ^ “ E ^ ( c 


< -EX*) — 0b (cOi 




[A 


t,t j 


(6) 

< c. 




(1 + || fli || + || Ci ||) 2 _ /|| A t , t || 2 U o{n - S ' y 


(HO) 

In ( |110| i, (a) holds because <pi, £ PL( 2); ( 6 ) is obtained using 
Holder’s inequality, and (c) follows from Bt( a) if X]”=i 
and V” , lltTlL ^ bounded and finite. This holds almost 

^ 2=1 n 

surely since 


E 


< c 


E 


< C" 


WT + / 

2—-/ -n 


_r— 0 


CT t-i 


|A M || 2 

n 

t- 1II 2 


lk t_1 || 2 , /-_Ln2 Ill'll 2 

+ Wt ) 


n 


(108) 


UM 


We show that each of the above coefficients is o(n s ). Indeed, 
for 1 < i < t, 


The RHS above is finite almost surely by So(d) - B t - \ id), 
Bt (a), and the Gaussianity of w and Z[. Thus by choosing 
6 < S', we can work with Ci instead of a*. Next, we use Fact 
13 to show that 


lim ? 


^ n i n 

- Met) -E Ez i {^{d)} 


= 0, (111) 


where the convergence (at rate n s ) follows from 77,/(f) and a PP ea l t0 Fact[2j we need to verify that 


- \n s (j> b (a) - E z i {n s 4> b (cj)} | * < cn K/2 . ( 112 ) 


i=l 
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Let Z[ be an independent copy of Z[. In what follows we drop 
i indices on Z' t and Z[ for brevity and define k = k + 2. Using 
steps similar to ( |94] >, we can show that 

E \4>b (a)- E z >{<k (, a)}\ k 

< /cVt L l s %',z' {\ z 't - z'tf (i +1 rtz'tf + } 

ut-l\ 




t -1 


I - _L 


(a) 

< «1 + «2 



i&r 


+ 




't-i 


i&; 


£-1 


+ 


Tl \ -v 

n 


£=1 


E 


z; 


6?.6] 


i •>••••) w i 7_2 


‘- 1 ** 


:— Ezt 


h u b 

"l ! ' ■ ' I “ i 


.t-1 


fct—1 i —-L V' 




It can be verified that <j)^ EW 


1 


E E ^ 


i=l 




£-1 


=.2 




*’•••’ £ ’ -2 


+ ^ 




EEp 


0o^o? •••> cr£-i^£-i? -- Zt-i + <^£ Z t ,aZ % 

&t -1 


j} 


E 


(7 r ( —-Z t _i + <JtZ' t ) 

<Tt-1 


at -1 


-a r E[Z r Z t _i] W of. 


Z r Zt~ i 


-'t—l • 


rate is n 


-5 


the convergence rate is n . 

(e) Recall m r = b r — w for 0 < r < f. The function 

<t>b(bi, • • •, b\,Wi) := (6[ - Wj)(6* - to*) £ P£(2). By 

B 0 (6), =' E{(a.4 r - aZJfaZ, - oZJ} = 

E{(a r Z r a t Z t } + ci 1 = of + cr 2 = f t 2 and the convergence 


,-<5 


rate is n 

(f) The function ^(6°,..., b\, Wi) := b\(bf - tu,) £ PL(2) 
for 0 < r, s < t. By Bg(b), lim ^ \ n m a = E{a r Z r (a s Z s — 
<jZ w )} = a 2 , s and the convergence rate is rT s . 


c(r,s) 

(g) Note that 61 = 

MT M, 


( M t * M t 

V n 


- 1 


m: 


We first show 


that the matrix 


is invertible with a finite limit. From 


(113) 


for some constants k' , k-\ . > 0. In ( |1 13| l, (a) holds since 

Z' t . Z' t are 7V"(0,1). Substituting ( | 11 3[ > in the LHS of ( |1 12) , 
and applying induction hypotheses Bo(d) - B t _\ id) shows that 
the condition ( | 11 2[ > is satisfied if <5 < . Thus ( |1 11 [ ) 

holds, and we now need to show that the limit of 

J " r 


the induction hypotheses Bo(e)-Bt-i(e), lim l(m r )*m s a = 
f ma x (r,s) at rate n ~ 5 for 0 < r, s < (t-1). Further, B 0 ( h)- 
Bt- 1 (h) and Fact|5jtogether imply that the smallest eigenvalue 
of the matrix — f —- is bounded from below by a positive 
constant for all n\ then Fact [6] implies that its inverse has a 
finite limit. Further, the inverse converges to its limit at rate 


-,—S 


as each entry in 
B 0 (e)-S t _i(e), 

lim a* = lim 


MT M t 


converges at this rate. Next, using 


(ctqZq Zf , (7Z w ) } 


is almost surely 0 with E[a r Z r <JtZ t ] = a 2 for all 0 < r < t. 
Define the function 

j>» EW (b°,...,bl\ Wi ) 


M*M t 


M^m* 


^ C-^tf 2 ( = } 


= 0 


't-i 


(114) 


has entries CY ? = 
l 4 denotes the all-ones 


£ PL( 2), and hence the 
induction hypothesis Bt~ i(b) implies that the limit of 


In step (a), the matrix C £ R txt 
f 2 for 1 < i, j < t and e# £ 

column vector. The equality ( b) is obtained as follows: first, 
note that C~ 1 e t is the solution to Cx = et. Next, since all the 
entries in the last column of C are equal to f 2 _,, by inspection 
the solution to Cx = e t is x = [0,..., 0, (r 2 ^)” 1 ]*, which 
yields (6) in ( |114| ). 

(h) Let P Mt = Al t (Mf M t _ l M* be the projection matrix 
onto the column space of M t . Note that M*M t is invertible 
with a finite limit in B t ( g). Then, 


TO l 


= 11(1 - PmJto : 


t II2 


is almost surely 0. 

The proof is completed by noting that ( (a 2 /at-i)Z t -i + 


TO 


* II 2 


(m'YMt 


AI^ AI t 


AIT ml 


(115) 


Ot Z't) is a Gaussian random variable with variance 

(<j 2 /(j t— i) 2 + (at) 2 = a 2 ’ w h ere we have used the definition 
of a) from ( |55| ) and the fact that Z t ~i and Z( are independent. 
Note also that for 0 < r < t — 1, 


Using the representation in (| 11 5|>, it follows by Bike) - P*(e), 


lim ■ 


(“) -2 -2 W -2 

= Tt — Tt e, L n - - 


= f, 2 


-4 


= (ft)- 


has entries C’ij = 
denotes the all- 


where (a) holds since Z r , Z[ are independent and (b) because 

a r a t -iE 


(c) The function (j> b (6 a ,..., b\, wi) := b\wt £ PL( 2). By 
Bo(b), lim ^ w a = E{a t Z t aZ w )} = 0, and the convergence 


In step (a), the matrix C £ 
t 2 ,. . . for 1 < i, j < t and e# £ 
ones column vector. The equality (b) follows from the same 
reasoning as in ( |114[ >. 

Finally since t 2 = a 2 (1 + snr) 1 ~^ r ' -1 for 0 < r < t, the 
definition of £ r _i in ( |32| i implies that (ft) 2 is strictly positive 
for r <T*, where T* = 


ic 


(d) The function ... ,b\,Wi) := b\b\ £ PL( 2) for 

0 < r < t. By Bo(b), lim — \ b a = E{a r Z r a t Z t } = a 2 and 


log (C/R) 

4) Step 4:: Showing Ht +1 holds. 

Q t+ iQt+ 


(a) Let Q t+1 : = 

QJ±lLt -EtUW 


and v' := 

From the definition of A 


i+l,t 
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Lemma [4~| ([60|, we have 


t—2 

At+1,4 = Cl+ r+1 + 

r—0 

^t-1 

1 

1 

) A ‘ 

EIKII 

ll« 4 lll 

t 

■ E 

r '=0 

++l r 

+ ( 1 J ‘ 

\Jn 

ypn 


+ Qt+iQt+iv', 

where we have used Fact [I] to write 


(116) 


1+11+ v A l|mi||Q t+ iZ t+1 _ ||toj_|| ^ „ s Z t+ i r 

~7S" ‘ ~ n Wto 


The matrix Qt+i = [ f J °(• • • |<7*] forms an orthogonal basis for 
the columns of Qt+i such that ||g s || = y/n and Z t+ \ £ R 4+1 
is an independent i.i.d. Af(0, 1) random vector. It follows from 
( |116| ) that 


t—2 


max l A t+i,t| < ^|a*|max|/ij +1 | 

jesecW ^ 3 



maxIiL 


r'= 0 


+ KyJnPt^ |[Q t+ V r 


r+l 


r—0 


(117) 


In the above we have used Qt+iQt+i v ' = 
Er=0 'f[Q t ++]r+l, and the fact that both max|(/J 
and max|gj| are bounded by KyfnPg for some constant 


K > 0. 


We show that all terms on the RHS of ( | 1 17| ) are 
o{n~ 5 y /log M) almost surely. This is true of the first two 
terms by TLi(a)-Ht(A), and Bt( g), which says that al¬ 
most surely |a*| £ o{n~ s ) for 0 < r < t — 2 and 
\a\_i — (f 2 /f 2 _ 1 )| £ o(n~ s ). By Fact p7| and B t ( h) the 
third term is almost surely o(n~ S y /log M). Considering the 
fourth term, ||m^_||/y / n has a bounded limiting value by 
B t ( h), yJnPi = 0(Vlog M) and \Z t+1 y\/yn £ o(n~ s ) 
a.s. for 0 < r' < t. Finally, the fifth term is almost surely 
o{n~ s y/\ogM) if we can show that | [Q+t/Jr+i | € o(n~ s ) 
for each 0 < r <t. We prove this in what follows. 


Note that 

[Qt+1*> ]r+l = 


a r+1 
-1 + 


( Qt+iQt+i \ 

-®t + l m L 

V " J 

n 


f Qt+iQt+i \ 
V « ) 


for 0 < r < t, 

r+l 

for r = t. 


(118) 


We show that each of the above coefficients is o(n ’). Indeed, 


for 1 < i < t + 1, 


B*+ i m l 


n 


{b i ~ 1 )*m t L _ + 1 )*(?n t — mp 


{b i ~ x )*m t 


-E 


(b i ~ 1 )*m r 


a. 


r—0 


lim 


+ )V 


-E- 

r =0 


n 

(<f)V 


where the convergence (at rate n s ) follows from B t {e), Bt( f), 
and Bt( g) (convergence of d? 4 to finite values). Therefore, at 


rate n 


-6 


B* t+ iwl 1 


lim 


(Qt+l)V 


t~ 1 


-E 


a. 


( Qt+l)*q r 


r+l 


r—0 


(119) 

and substituting ( 1 \9\ in ( | 11 8[ > we see that each coefficient 
of ( |118| l is o(n~ °). This completes the proof demonstrating 
maxjesec+fAt+ 1 ,+ 1 “= 0 (n~ s y/logM). 

Next, from Lemma [4] ( [56] ) it follows, 


max 

jEsec(£) 


< 


iftf'i 

max \h]\ + 1+1 max \Z tj \+ max |A t+ i, t 


r t-l J6sec(f)' 3 


-2 

a.s T f 

< ,, 


j£sec(i) j€sec(£) 

C t y/logM + |r o |0 (+logM) + 0 (rC 5 y/\og . 


The second inequality above comes from 7f*(a), Fact [7] and 
the first result of Ht+i(a) proved above. 

(b) As in the proof of H \ (b), we provide the main steps of 
the proof, referring the reader to J28) for details. Throughout 
we use generic cj)k,e(x,y, z) as the steps are identical for all 
k £ {1,2,3,4}. From Lemma [4] ([56]), 

/ t t 

^a u /i“ +1 ,^6+J +1 ,/3o, 


\u —0 


r—0 


•5A 


— ( L/ a u^ +1 + a t T t~ Z te + a t [++!,+ > 


\u =0 
£—1 


E ^vK +1 + ( , t T * Lz u + h [A t+ i,+, f3 0e I , 

w=0 / 

where a! u = a u and b’ v = b v for 0 < u, v < t — 2 and 
a't-i = a t -i + a#y+ and b[_ 1 = 6 t _i+6 t? +. By H t+ r(a), 

ma xj e aec(^)l[At+i,t]j'l a =' °(. n 5 Vlog M) for each i £ [L\ 
and some 5' > 0. In [28], the first step of the proof uses this 
to show for each of the functions in 


1 L 
tE 


£=1 


f t -1 


<t>k,e ( L] + a * T t L Zt e + at[At+i,+> 

\u=0 

E T* btft~Z te + b t [A t +i,t\e, Po e j — 

v=0 J 

ft -1 t -1 

+/ E a >“ +1 + a t f t ±Z ^T, b 'vK +1 + bt f t ±z t e ,Po e 


\.u —0 


r—0 
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is almost surely o{n s log M). Choosing <5 £ (0, <5') ensures 
that we can drop the deviation terms A t+ i t . 

The second step of the proof appeals to Fact [2] to show that 
the limit of the expression 


L 


Y,[fa^{Z a '^ +1+at p ±z ^Z b '^ +1+btf * ±Zt ^^) 


u=0 

t-1 


v=0 

t-1 




L 


i=l 


n 

~L 


t-i 


i=i 


n—0 


u =0 
t-1 


E \[fa* (E « + 1 + a t f t ±z *t - 

t-i 

v=0 
t-1 

EEz t <j>k,i (E + a t r^~ Z tn 

0 

t-1 

y: b' v T v Z V£ + b t f^~Z ti ,f}ij 


v=0 


is almost surely 0. To complete the proof we show that 

t-i t-i 

EE Zt ( E a 'uTuZ u + atf t x Z t , E b' v f v Z v + b t ft~Z t , ] 

u =0 i>=0 

t t 

' j^fc,l ^ ^ Ctu'TuZui ^ ^ byTy Z v , 


= E 


- ’Ez t <t>k,t ( E a 'uK +1 + «trt X Z t/ , E + b t r t x Z t( , /So*)] 

u=0 u=0 

( 120 ) 

is almost surely 0. Let Zt be an independent copy of Z t . 
Dehne the value diff/, : ./ to be the following difference for each 
£ £ [L] and each function in ( [66) ) with k = 1, 2, 3,4, 

diff k,e '■= 

t-i t-i 

fa/ ( E a ^“ +1 + a t f^-Z tt , E K,h v e +1 + b t f^-Z tl , /? 0 *) 

u,—0 v=0 

t-1 t-1 

- &,/ ( E a 'uK +1 + , E b 'v h e +1 + b ^ Z P, A> t ) 

u—0 v=0 

In order to use Fact [2] (conditionally on ,5^ +] t ) to get the 
above result we must prove that 


v=0 

/^2 


Recall a[_ 1 = a t -i + a t (r t 2 /t 2 ^) and b' t _ 1 = b t -1 + 
btij't /^i-i)- Th en to prove the above, we will show that 
(T t 2 /f t —i)Z t —i + f t - L Zt = T t Z t where f r f t E[Z r Z t ] = f t 2 for 
0 < r < t — 1. Indeed, ^(r f 2 /r t _ 1 )Z t _ 1 + f/“Z t ) is Gaussian 

with variance equal to (f f 2 /ft_i) 2 + (r x ) 2 = f 2 , using the 
definition of f x in ( [55] ) and the independence of Z t _\ and 
Z t . Further, for 0 < r < t — 1 


E 


r Z, 


T r zj r 
2 /= 2 


(( T t/ T t- i)Z, 


t-1 + Tf 


= (r t /T t _ 1 )T r T t - 1 E[Z r Z t _ 1 ] = T t . 


The 


existence 


of 


the 


limit 


of 


( 121 ) 


for some constants 0 < re < 1 and c > 0. The exact condition 
required by Fact [2] follows from ( | 121 [ ) by an application of 
Jensen’s inequality. In [ |28| it is shown that for each function 
in ( |66| ) and each £ £ [L ], 

Ez t ,z t |diffM| 2+K =' 0((logM) 2+K ). (122) 


E{(t>kAY^\=o a uTuZ u ,Y!v=t) b v T vZvi Pi)} for k = 1 

follows from the law of large numbers; for k = 2,3,4, the 
existence of the limit follows from Appendix [D] 

(c), (d), (e) These are shown by invoking T-it+iib), and are 
similar to the corresponding results for step 'H \. 

(f) Using the fourth function in ( [66] ) for any 0 < r. s < t 
by H t+ i(b), 

L 


lim 


{h s+1 )*q r+1 


=' lim - ^2E{T s Z* ( [ri r e (l3 - T r Z r ) - fa}}, 


i=i 

,- 5 \ 


Bound \\22\ implies ( | 12 1| ) holds if <5 is chosen such that 4(2 + 
k) < re/2. Hence ( | 120| ) holds. 

Considering result ( | 120| >, define new functions for 

k £ {1,2,3,4} as 


and the convergence is o(n~°). Using arguments very similar 
to those in 7+i(f) (iterated expectations and Stein’s lemma), 
we obtain that 

E{f s Z* se [r, r e (P-f r Z r )-p e ]} 

= ^E[Z Sl Z ri ] 

T r 


T 2 , , 

ma x(r,s) 


<t>ZT w E^ +1 >E 6 « ft J +1 >^ := 

\u=0 v=0 ) 

t-1 t-1 

Ez t <l>k,t ( E a 'u-K +1 + a tft~Zt t , E b' v h v e +1 + btftZ t( ,/3o e y 

u =0 u=0 

Using Jensen’s inequality, it can be shown that the induction 
hypothesis Ht(b) holds for the function <p^f w whenever 
TLt(b) holds for the function 0ki inside the expectation. This 
work can be found in (281. Therefore, the limit of 


f(/3- f r Z r )|| 2 -nP t 
(E\\nl{p - f r Z r )\\ 2 - nPz 


<e[4 

(123) 


Here Z Sl ,Z Tl refer to the first entries of the vectors Z s ,Z r , 
respectively. Using ( | 123[ ) along with the fact that (p — 

iE{|| t] r (p - f r Z r ) || 2 } ^ -+ o- 2 +1 (cf. Appendix D I, ( |VI-B4| ) 


becomes 


lim 


(h: 


s + 1 '\*^ r + 1 


T 2 if 2 

max(r,s) r+1 


Next, from ( |45[ >. we observe that 
Ar+1 = 01 M22E _ P 


lim 


r 2 \ r 

1 f E\\r] r ((3 — ? r Z r 


(124) 


-P = 


'r+l 
¥2 


where the convergence at rate n~ s follows from T-L t+ i(b) ap¬ 
plied to the second function in ( f66[ ). The last equality in ( |124| ) 
is from Appendix Id] By B t (e) lim (m r )*m s /n “=' r r 2 ax(r . s) , 
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which along with ( | 1 24[ ) completes the proof. 

(g) Note that f+ 1 = (91+iR±±p'j 1 . Similarly to 

the proof of step Bt( g), the matrix ^Q^ +1 Q t +i can be shown 
to be invertible with a finite limit using % i(e) - 7f*(e), 77i(h) 
- ?ft(h). Fact [5] and Fact [6] Then use 77i(e) - 77 t (e) to find 
the value of the limit of 7 t+1 . 

(h) This result follows similarly to B f (h) but uses the 
convergence results 77i(e) - r Ht+ i(e). 


Appendix 

A. AMP Derivation 


Using ( | 1 28[ > in \121) yields 



(129) 

Notice that we have replaced the stand-alone term A a iz *_ > . i 
in \\21) with A a iZ ‘ because the difference A a iSz^_ >i is 
0 {\/log n/n), which can be ignored — we only keep terms 
as small as 0(n -1 / 2 ). 


In ( |22| , the dependence of z t a ^ fi on i is only due to the term 
AaiPl^a being excluded from the sum. Similarly, in ( |23) > the 
dependence of on a is due to excluding the term A a/l z t a ^ i 
from the argument. We begin by estimating the order of these 
excluded terms. 

Note that A ai = 0(n -1 / 2 ), and /3*^ 0 = 0(y/logn). 
The latter is true since for i in section £, < y/nPi, 

where Pi = 0(1/L), and L = 0(n/logn). Therefore 
AaiPl^a = O ^-^/log n/n'j. In ( |23| >, the excluded term 
Auzl^ i is 0(n~ 1,/2 ) because zl_ ) . i = 0(1). We set 

4-+i =4+ <**£-►<> and = Pl +1 + ( 125 > 

Comparing ( | 125| > with ( |22[ >, we can write 

4 = Va- A *jPj->a, S 4^i = AaiPUa- ( 12 6) 

j£[N] 


For i £ [iV], let sec(i) denote the set of indices in the section 
containing i. To determine SBj^a, we expand r ]| in ( 23]) in a 
Taylor series around the argument j)C&e[n] A *>j4^j 
which does not depend on a. We thus obtain 


j’Esec(i) 


3t+l 


({ 


1 , z * 


} ,6sec(iW 

JEKC J. 

\ fee[nl / 


/* ' AbjZb^tj 

b£[n] 


(127) 


where dit]j(.) is the partial derivative of with respect to the 
component of the argument corresponding to index i. (Recall 
from m that the argument is a length M vector.) From ( |T0| , 
the partial derivative can be evaluated as 


diVi(4 


Vi(s) di In tjKs) 


Vtis) 

( y/nPe 

VnPe 

l* ' 

4 

Vi{s ) , 

T t ’ 

'^JnPi - 

Vi(s)) 


e 


Gsec(i) ^ 



Since only the second term on the right-hand side of \\29\ 
depends on a, we can write 



(131) 


We observe that <5/3|_j. a = 0(\ogn/y/n). Hence, in ( | 126| >, we 
can write 

SzLi = A aiPl (132) 


because the difference A a i5f3l^, a = 0(\ogn/n). Substituting 
( p~32] > in ( fl30l ), we see that 



(133) 


where (a) holds because ]T) h A lj -> 1 »s n -> 00. Analo¬ 
gously, using (| 132[> in (| 131 [> gives 


SB t+1 = Aai Z ‘ 

T t 


‘-4{ +n} f 

V\iN JjesecW / 

\ ^ ) 7 £ seem/ 


(134) 


Finally, we use ( |133| ) and ( | 1 34[ ) in ( |126| l to obtain 

4 = Ua— ^2 A ak{Pk + SPl^a) 

fee [iv] 


(128) 
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= Va- Aak ? ?fc 1 ( A * zt 1 + P X ) 

fee [iv] 

+ A2ak f a 1 rfk 1 {A*z t ~ 1 + /3 t_1 ) 

T t-i 

■ \J nP sec(k) - rfk 1 ( A *z * _1 + /3 t_1 ) 

= 2/a - (40*)a + ( nP ~ ll^ll 2 ). (135) 

nT t-l 


where (6) is obtained as follows. First, we use A^ k ~ 1. Next, 
( [Tol l implies that for all s, 

L 

Y \J nP sec(k) vl(s) = Y nP t = nP. 
fee [at] e -1 


The inner expectation in ( | 1 39[ > is of the form 

e V v it L \ lnMf/i 
E 


pJ v Yir\ lnM£/ i _|_ Y^ M -2 e ^ Vl(Li ln M Uj 

= Ea ' I Y 
c + A 


U! 


(140) 


where c = exp (y/vyzL} ln M U\) is treated as a positive 
constant, and the expectation is with respect to the random 
variable 

M 

A := M ~ V ^ L j exp (y^Lj In MU,) . (141) 

3= 2 


Finally, note from ( |133| l that Ylkirfk 1 ( A*z 1 1 + /3‘ 1 )) 2 = 
J2k(Pl) 2 = H/3 4 1| 2 - The AMP update equations are thus given 
by @35]! and ( fmj ). 


B. Proof of Lemma [7] 

From ( |26l >, x(t) can be written as 

l p 

(136) 

i =i 

where 


£e(T) = E 


e i- 


ut 



(137) 


The result needs to be proved only for £* > 0. (For brevity, we 
supress the dependence of £* on r.) Since Pi is non-increasing 
with l, it is enough^] to prove that for £ G (0,1], 

= xltp. (138) 

Using the relation nR = L ln M /In 2, we can write 


nP ltH 

T 2 


= i'LflLj In M, 


where 


mu = 


LP liU 

Rt 2 ln 2' 


From the definition of £* in the lemma statement and the 
non-increasing power-allocation, we see that lim v\il\ > 2 
for £ < £*, and lim < 2 for £ > £*. 

For brevity, in what follows we drop the superscripts on 
Uj ^, and denote it by Uj for j G [M], From TO %Lj(r) 
can be written as 


%-GI M 
— E 


lnMu i 


= EE 


gV^LSij In MU! + M -v Vi L\ 1 

e \A L«tJ ln MU 1 


eV 1 


L«iJ 


InMC/i 


+ M ~ u L«iJ J 2 ji 2 e^ LSiJ 


| Ur 

(139) 


4 We can also prove that lim£i£*£j = , but we do not need this for the 

exponentially decaying power allocation since it will only affect a vanishing 
fraction of sections as L increases. Since Eg E [0,1], these sections do not 

affect the value of lima)(r) in (113- 


Case 1: £ < £*. Here we have limt/^^j > 2. Since is 
a convex function of X, applying Jensen’s inequality we get 
l^A'Ijqrx] — c+ex ■ The expectation of X is 


M 


EA = M~At L J Y K In MUj 

3 =2 

( =^ M~ V L^J (M - l)M ! 'LSij/ 2 < 


with (a) is obtained from the moment generating function of 
a Gaussian random variable. Therefore, 

> c > c 

“ c + EA - c + M 1 _i/ L«gi/ 2 

1 

1 +C- 1 M 1 ~ v L^j/ 2 ' 

(142) 


1>E X 


A 


Recalling that c = exp ln M U\), \\A2) implies that 


E 


x 


In MU! 

\/ u iiL\ In M Ui _|_ y 

1 


C7i 


(143) 


1 _|_ M l ~ v u-f-j/ 2 lnMC/ i 

When {Z7i > — (InM) 1 / 4 }, the RHS of ( |143[ > is at least [1 + 
M 1-I 'L« i J/ 2 exp ((lnM) 3 / 4 ^ LUd)] _1 - Using this in ( | 1 39[ >, 


we obtain that 

1 > £\£l\ (t) 


> 


P(Ui > —(InM) 1 / 4 ) 


M—fOO 


(144) 


1 -|_ e (lnM)3/4 v /I T7U' 

since limi/i^m > 2. Hence ^[jlj —>• 1 when limz/^Lj > 2. 

Case 2: £ > £*. Here we have lim //1 pr < 2. The random 
variable A in ( |141[ ) can be bounded from below as follows. 

A > M~ v ^ L i max e^L^j 

_ ( 14 5) 

= M~ v Le-r-J g[ max 3'e{2,...,M> ln M. 


Using standard bounds for the standard normal distribution, it 
can be shown that 

P ( max ^ < V21nM(l - e) ) < , (146) 
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for e = u () j^] Combining ( |146| l and ( |145| ), we obtain 
that 


exp(—M e(1 “ e) ) > P ( max U, < V21nM(l-e) ) 

> P ^X < e ' /21nM ( 1_e )V l/ L«iJ InM^ 

= P ^X < Afv /2l/ LSij (i-'O-i'Lez.j ^ _ 

Since limi/igm < 2 and e > 0 can be an arbitrarily small 
constant, there exists a strictly positive constant <5 such that 
S < (1 — e) — f° r a U sufficiently large L. 

Therefore, for sufficiently large M, the expectation in ( |140| ) 
can be bounded as 


E 




X 


< P(X < M 5 ) ■ 1 + P(X > M s ) 


M s 


< +1 


M s 


< 


1 +c~ 1 M s ' 

(147) 


Recalling that c = exp (a/z^l In M Uf), and using the 
bound of ( | 147 [ > in ( | 1 39[ >, we obtain 

%£j( r ) < E 


1 _|_ M‘ 5 e _ ''/" LS - LJ lnMC/l 

< rtf/, > (MM)'/*). 2 + smit-M) 1 /;) 

v v ' 1 + M 5 e _ v /I7 XP'( lnM ) 3/ 

(“) I,. n /if/ 1 / 2 2 

< 2e“ 2 L ln + 1 . 


1 _|_ e <5 (InM) 3 / 4 


(6). 


0 as M —► oo. 


In ( | 148| >, (a) is obtained using the bound <h(a;) < exp(— x 2 /2) 
for x > 0, where <!>(•) is the Gaussian cdf; (6) holds since <5 
and lim v\£l\ are both positive constants. 

This proves that £\^l\ (t) —> 0 when lim v\^l\ < 2. The 
proof of the lemma is complete since we have proved both 
statements in (|138|l. 


C. Proof of Lemma [2] 

For brevity, let := for t > 0, where £*(•) is defined 

in Lemma [l] For t = 0, rig = a 2 + P. Then, from Lemma [l] 
we obtain 

LG PI p 

X\ = lim V —, 

L—foo P 
£=1 

where £ 0 is the supremum of all £ E (0,1] that satisfy 
lim LPitLi = cr 2 (l+snr) 1- ^ ln(l+snr) > 2 P(ct 2 +P) In2. 

L —zoo LS J 

(149) 

The first equality in ( |149| > is due to ( |29| . Simplifying ( |149| ) 
yields the condition £ < log (C/R), from which it follows 
that the supremum is £ 0 = ■ 

Using the geometric series formula Y^c=i -Rs = (P + 

5 Recall that f(n) = u(g{n)) if for each k > 0, |/(n)|/|g(n)| > k for 
all sufficiently large n. 


er 2 )(l — 2 2Ck / L f ( | 149t > becomes 


LG PI 


Xi 


E Pf P 4“ CT 2 / 

= T, (1 — 2~ 2C ^°) 

p p \ > 


i=i 


(l + snr) - (1 + snr) 1 -^ 0 


snr 

The expression for f 2 is a straightforward simplification of 

CT 2 + P(1 - X\). 

Assume towards induction that ( [30] ) and ( |3T| > hold for x t . r 2 . 
For step (t + 1), from Lemma [I] 

LGi J p 

x t+ i = lim y —, 

L—>oo ' P 
1=1 

where £* is the supremum of all £ E (0,1] that satisfy 
lim PPfLi = o- 2 (l + snr) 1-4 ln(l + snr) > 2Rf? In2. 

L—zoo L J 

050) 

Using the expression in © for r 2 (due to the induction 
hypothesis) and simplifying ( |150| ) yields the condition 

Hence the supremum is = £ t -i + 4^ log 2 (C/i?). It follows 
that 


£t+i 


^ P e P + CT 2 2C c , 

= lim y T, l-2~ 2Cgt ) 

L—/oo 41^ p P V ' 

r=i 


(l + snr) - (l + snr) 1 «‘ 


(151) 


snr 


The proof is concluded by using ( | 151 [ > to compute T t+i 
P + cr 2 (l - X t +i). 


D. The limit of ^E{[q r (/3 - T r Z r ) - /?]*[r? s (/3 - t s Z s ) - j3]} 
equals ri 2 +1 for —l < r < s < t. 

Noting that ||/)|| 2 = nP , we prove that the desired limit 


lim 


±R{[r} r (l3-f r Z r )]*[n‘(l3-f a Z s )]} 


--E{/3* V r (P - f r Z r )} - -E{/3V03 - T a Z s )} + P 
n n 


(152) 


equals ri 2 +1 = cr 2 ((1 + snr) 1_ ^ s — l). For the case r = s = 
— 1 the result holds since rig = P, so assume s > — 1. To 
obtain the ( | 152| >. we show the following: for 0 < r <t, 

lim ^-E{/3*r] r (f3 - f r Z r )} = t g - r 2 +1 , (153) 

and for 0 < r < s < t, 

lim ^E{[q r (P~f r Z r )]*[ri s (l3-f s Z s )]} =f^-f 2 +1 . (154) 

The above results are all trivially true if r = — 1. 

We first show ( [153] . Since /3 is distributed uniformly over 
the set Bm,l, the expectation in ( | 1 53| > can be computed by 
assuming that (3 has a non-zero in the first entry of each 
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section. Thus 
1 


lim -E{0*if(0 - T r Z r )} 


n 

L 

= lim Pi E 
i = l 

LC-G 


e T r e T T 


U 1 


e T r g T r 


t/l 


E,= 




2=2' 


= lim ^ Pe = cr 2 ((1 + snr) - (1 + snr) 1 ^ r ) 


i=\ 


(&) 


= Tn -T, 


r+1 * 


(155) 


In ( | 155| >. {£/(} with £ £ [L],j £ [M] is a relabeled version 
of — Z r , and is thus i.i.d. 7V(0,1). Equalities (a) and (6) are 
obtained from Lemmas [l] and [2] (cf. ((26), <(27). and ((31)). 


Consider result ( | 154| >. From the proof of Proposition l] 
(noting that 0 r+1 = rj r (0 — f r Z r ) and cf. ( p~8) and (p~9 >), 
it follows that 


-E||»£G8 - f r Z r )|| 2 = -E {ftrjitf - f r Z r )j, £ £ [L], 

(156) 

which proves the result if r = s. For r < s, we obtain the 
result by showing that 


. LCr-tjJ 

lim -E{[if(0 - f r Z r )]*[r) a (0 - t s Z s )]} < lim V P e , 

n £=i 

(157) 


. LCr-tjJ 

lim -E{[if(0 - f r Z r )]*[r] s (0 - t s Z s )]} > lim V] P e . 

n £= l 

(158) 


We then we get the desired result by observing that the limit 
on the RHS above equals f ' { 2 — t 2 +1 , as in ( |155| ). From the 
Cauchy-Schwarz inequality, we have 

lim yE{[if(0 - f r Z r )]*[r] s (0 - f s Z s )]} 

i L 

= lim n - T r Z r )]*[if t {0 - t s Z s )]} 

n e=i 

< lim 1 5 >||t7K/3 - frZ r ) || 2 ) 1/2 (E||r7|(/3 - r s Z s )|| 2 ) 1 / 2 

i 

- lim ^2Pi£i(f 2 )£i(f 2 ) = lim ^ 

i i=i 

(159) 


where (a) is obtained using the Cauchy-Schwarz inequality; 
( b ) follows from ( |156| ), ( | 1 55| >, and the definition of £/(•) in 
( |137| ); (c) is obtained as follows. Consider £\tL\ (j 2 ) and 
£q££j(f 2 ) for some £ £ (0,1]. It follows from the proofs of 
Lemmas |T] and [2] that. 


lim%ij(r r ) 


1, for £ < £ r , 

0, for £ > 


and 


lim%Lj (r s ) 


1, for £ < £ s , 
0, for £ > £ S) 


where £ r , / s are as defined in Lemma[2] Since r < s, we have 
Hr < 6. which yields (c) in (| 1 59|>. 


For the lower bound ( | 1 58[ >, since 0 is distributed uniformly 
over the set Bm,l, the expectation in ( | 1 54[ > can be computed 
by assuming that 0 has a non-zero in the first entry of each 
section: 

r z r )]*[r, a ( 0 -f a z s )]} 

= -E E {fe r (^ - f r Z r )]*[rj s i(0 - f S Z S )]} = J2 p z£r S ,e 
n e e 

(160) 


where 

£rs,£ — 


E 


e K e +br e U rl e K + b ‘t U *l 


+ E^ 2 e b - 


^ e bs 


(e b? * +br « !7 * 1 + e br t u Zi)(e b ’e +b ‘* X7 °i + t 2 e b 

(161) 


with b 2 f := nPp/f 2 and b 2 ^ := nPp/f 2 . In ( | 1611 >, the pairs 
of random variables U^)}, j £ [M\ are i.i.d. across 

index j, and for each j, £/C and U ( s;j are jointly Gaussian 
with Af( 0,1) marginals and covariance f s /f r . 


Consider the expectation using just the first term in the 
numerator on the right-hand side of ( | 16 1| >. This can be written 
as 


E 


E 


eWi 


K e b -i u ri + Y*L 2 e breU ‘i~ b ri 


f e b H u ‘x \ 

1 

i 

lx. 

K. 


^ rh ^ si 


> E 


= E 


e ^ e Kx 


e b u u ti 


z br e u ‘i + Me ) \e b ^ t/ = 1 + Me 2b ~■ 


1 — Me 


l + Me-T- b *< y .i 


-l 


> P i > -b b / 2 , U l aX > -b H 2 ) 1 + Me^ 




1 + Me~^ +b ^ 


(b) 


1 as M — > oo for 1 < £ < L£r-£J • 


(162) 


In ( fl62) , (a) is obtained as follows. The inner expectation 
on the first line of the form E_v i y[/(X, Y)] with f(X,Y) = 
k£+y ‘ k^+y ■ where k-| . k 2 are positive constants. Since 
/ is a convex function of (X,Y), Jensen’s inequality im¬ 
plies E [f(X,Y)] > /(EX, EF), with E[exp(b r< 17^)] = 
ex P(| b rJ- 


To obtain the convergence in step (6) of (]162[), note that for 
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lim 


b 2 

2hGU 


= lim 


nP( 


2f2ln M 


> lim 


nPi 


LfriJ 


2t,? In M 


= lim 


LPi 


L£rAI 


(163) 


2Rf$ In 2 


= 1 , 


where we have used nR = L log M and the fact that is the 
supremum of £ £ (0,1] for which LPy ^^j > 2Rf% In 2 (see 
proof of Lemma 2|. 

Since B rs j in ( 161 [ ) lies in [0,1] for all /', ( | I62j i implies that 
lim £ rs ,i = 1 for 1 < £ < \£, r L\. Using this in ( 1 60| ) gives the 
lower bound ( 1 58| ). Together with the upper bound in ( | 1 57| >, 
this proves (|154|i, and hence completes the proof. 
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